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Introduction to. Statistics : 


rE i) 


1.1 Introduction 

The word statistics was first used by a German scholar Gotifried Achenwall 
about the middle of the 18'" century as the science of Statecraft concerning the 
collection and use of data by the state. According to another pioneer statistician Yule, . 
the word statistics occurred at the earliest in the book “The elements of universal 
erudition” by Baron (1770). It was used again with rather a wider definition in 1787 
by E.A.W. Zimmermann in “A political survey of the present state of Europe”. It 
appeared in Encyclopedia Britannica in 1797 and was used by Sir John Sinclair in 
Britain in a series of volumes published between 1791 and 1799 giving a statistical 
account of Scotland. In the 19" century, the word statistics acquired a, wider meaning, 
covering numerical data of almost any subject whatever and also interpretation of 
data through appropriate analysis. The word data is used for numerical facts and a 
single numerical fact is datum. - 


Since early 1920’s with the growth in the experimental sciences there was a 
need for reliable scientific methods for analyzing the results of experiments and 
surveys. The modern subject of statistics evolves with many of the early developers 
of statistical methodology being experimenters themselves with the incentive of their 
_ Own practical problems. - . 

Although the gathering and presenting of information is still an important part 
of statistics, the modern statistics is quite different from the early days. Statistics 
now-a-days includes probability theory and applied mathematics. These days 
computers have made to perform statistical analysis routinely which could not have 
been contemplated in the past. Computers do not form a part of statistical theory but 
. may be useful in applying statistical theory to solve a practical problem. 


Statistics has been defined as the mathematical science of making decisions 
and drawing conclusions from data in situations of uncertainty. It includes 


designing of experiments, collection, organization, summarization, analysis and 
interpretation of numerical data. 


In the above definition of statistics, we have only considered its scientifi- 
meaning. In day to day usage, the word statistics refers to numbers or facts, such as 
Statistics of births, statistics of deaths and statistics of road accidents etc. In some 
other situations, it has a symbolic meaning such as “do not become a statistics on the 
next weekend”. The word statistics is also the plural of ‘statistic’ which is a statistical 
term and is a quantity calculated from the sample values. 


Before proceeding further, some statistical terms and notations need to be 
defined and discussed. We explain each of the terms through the following example: 


Example 1.1: It was observed that out of 500 college students surveyed, 300 were 
females. Is there evidence that more students in this college are females. 


1.1.1. Definitions 


Population: The total group under discussion or the group to which the results will 
be generalized is called population. In the example |.1, the set of all students in the 
college is our population. 


Sample: Sometimes the measurement of interest can not be made on the whole 
population, then we choose a subset of population to draw inference about the 
population. If the inferences from sample to population are going to be meaningful, it 
is imperative that the sample should be representative of the population. In the 


example 1.1, the set of 500 students being a subset of all college student, is our 
sample. 


Ratio: The ratio of A to B is the fraction A/B. In the example 1.1, there are 300 
females and the remaining 200 are males. So, the male female ratio is 200/300. 


Proportion: A proportion is a special ratio of a Part to its total. In the example 1.1, 
the proportion of females and males are 300/500 and 200/500 respectively. Proportion — 
becomes percentage when multiplied by 100. 


The- female percentage in the example 1.1 is (300/500) x 100 ie., 60%. 60% means 
60 out of 100. The symbol % is abbreviation for percent. 
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Parameter: It is a quantity computed from a population when the entire population is 
available. Parameters are fixed or constant quantities and are not usually known. In 
the example!.1, the proportion of female students in the population is our parameter. 


Statistic: It is a quantity computed from a sample. In the example 1.1, the proportion 
of the female students in the sample of students is a statistic. Statistic is variable 
because it varies from sample to sample. 

Sampling Variability: As sample is a representative part of a population; there may 
be more than one samples in a population. Therefore, all samples from the same 
population may not be identical e.g. with reference to the example 1.1, another 
sample of 500 students surveyed may not contain 300 female students. 


Experiment: Any study in which the scientist can control the allocation of treatments 
to the experimental units is called an experiment. Every unit must be capable of 
receiving every treatment, and the decision as to which unit receives which treatment 
is determined by an allocation mechanism. This mechanism could be that the scientist 
observes the unit and then decides which treatment to apply, which would be an 
unsatisfactory allocation mechanism because of the possibility of the subjective bias, 
or it could be to assign the treatments according to a rule which would be 
scientifically acceptable. So, all allocation mechanisms do not lead to acceptable 
inferences. 


Sample Surveys: In a sample survey, there are no treatments. The units in the 
population under study are listed in a frame and a sample of units is selected from the 
frame using a selection mechanism. So, the distinguishing feature of a survey is the 
control over the selection of units. 


The features which distinguishexperiments -andsample surveys are control over 
allocation and control over selection. 


Constant: Quantities which don’t vary from individual to individual are called 
constants e.g. 


tT = 3.14159, e = 2.71828, 4, 25 etc. 


Order Statistic: The order statistic (OS) of data Y;, Yo, ¥3,.. «|, Y,, is just: the 
arrangement of data in order of magnitude. It is denoted by Y(1), Xia, Yay, ... Yin 
Y,,)is the minimum of Yj, Y2, ¥3,...., Yn and Y(q)is the maximum of Y;, Y2, ¥3,,....Yn 
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Yi) is minimum order statistic and Y,,) is maximum order statistic. If the data 
are arranged in increasing order of magnitude, the data are said to be arranged in 
ascending order and if the data are arranged in decreasing order of magnitude, the 
data are said to be arranged in descending order. 


Model: It is a mathematical ; statement used in studying the results of an 
experiment or predicting the behaviour of future repetitions of the experiment. 

The models involve prebability distributions which describe the variability in 
the characteristic of interest in the population and therefore, the variability we might 
expect to getin different samples. 


In the example 1.1, binomial model is appropriate where parameter_is the 
proportion of students who are females in the population. 


A common model for describing the makeup of an observation states that it 
consists of a mean plus an error so an observation can be described by means of 
simplest model as: 


Y= p+ ; (1.1) 


where, Y; represents any individual observation, u (a Greek letter read as meu) 
represents average or mean of the population and é; (a Greek letter read as epsilon) 
represents a random error. i 
Random Error (€j): Random error is the chance variation in an observational process. It 


should not be confused with its synonym in common usage mistake, which means 


human error. 
Equation (1.1) can be written as: 


€;=Yi-p (1.2) 


€;’s are usually assumed to be from a population having zero. mean. As the 


random error sum to zero, so, there would be approximately equal number of positive 
and negative deviations. 


The term, (Y¥; — 4) is known as deviation of an observation: Y; from mean p.. 


Equation (1.1) represents the simplest form of the linear additive model. The 
Ht may be a single mean, 
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1.1.2 Notations 
‘Sigma (2): It is a Greek letter and is used as a short-hand notation for sum. For 
example, 
i.e., tf we have to add five numbers Y,, >, Y3, Y4, and Y; , then 
Y i+ Y2 +¥3+ Yat Ys = 9 Yj 


This tells us to sum all Y, values starting at i = 1 (the lower limit) and‘ 


t 


stopping at 7 = 5 (the upper limit). It is always assumed that consecutive integer 
values are to be summed unless otherwise specified. Other examples are 


i) | &2':-¥ € Dwhere D= {2, 3,4,7} 
Soka 2 io? 4 Whe 2? 

where & is read as belongs to. 

ii) If P(Y) denotes the probability then 
» P(Y) = P(2) + P(3) + P(4) + P(7) 
Ye D 


Product (7): It is a Greek letter (read as pi) and is used here as a short-hand 
notation for product. For example, 


ny, =Y,xY, xY,x....xY, 


If we have only three observations _ 
Y, =2,Y, =3, Y, =5, 


3 
then XY; =Y,xY,xY,=2*x3x5=30 


n!: Read asn factorial and is defined as: 
| RT =n l).(n-2)....3). 2). (1) 
=>  4!=43.21 
=24 


and I!=] 
Note that 0! = 1° 


1.2 Importance Of Statistics In Various Disciplines 


Since the information collected in the form of data (observations) from any 
field will almost always involve some sort of variability (uncertainty), so, this subject 
has applications in almost all fields of research. The researchers use statistics in the 
analysis, interpretation and communication of their research findings. Some examples 
of the questions, which statistics might help to answer with appropriate data are: 


i) how much better a yield of wheat do we get if we use a new fertilizer as 
opposed to a commonly used fertilizer? 

- i) are a company’s sales figures likely to increase in the next quarter? 

iii) | what dose of an insecticide be used successfully to monitor an insect 
population? 

iv) what is the likely weather in the coming season? 
It is obvious that statistics has its application in almost every field where 

research is carried out and findings are reported. 


When Statistics is applied in Economics, it is called Econometrics, When it is 
applied in biological sciences, it is called Biometry. Similary, Psychometry and others. 
We give a brief account of its application in differnt fields as follows: 


1.2.1 Social Sciences 


In social sciences, one of the major objectives is to establish relationship that 
exists between certain variables. This end is achieved through postulating hypothesis 
and their testing by using different statistical techniques. 


Most of the areas of our economy can be studied by econometric models 
_ because they help in forecasting and forecasts are important for future planning. 


1.2.2 Plant Sciences 


The most important aspect of statistics in plant sciences research is its role for 
efficient planning of experiments and drawing valid conclusions. A technique in 
Statistics known as ‘Designs of Experiments’ helps introducing new varieties. 
Optimum plot sizes can be worked out for different crops like wheat, cotton, 


Sugarcane and others under different environmental conditions using 


Statistical 
techniques. 
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1.2.3. Physical Sciences 


The application of Statistics in physical sciences is widely accepted. The 
researchers use these methods in the analysis, interpretation and communication of 
their research findings, linear and nonlinear regression models are used to establish 
cause and effect relationship between different variables and also these days 
computers have facilitated the experimentation and it is possible to simulate the 
process rather than experimentation. 


1.2.4 Medical Sciences 


The interest may be in the effectiveness of new drugs, effect of environmental 
factors, heritability, standardization of various records and other related problems. 
Statistics comes to rescue. It helps to plan the next investigation in order to get 
trustworthy information from the limited resources. It also helps to analyze the 
current data and integrate the information with that previously existing. 


1.3. Variables 


A characteristic that varies from individual to individual in a population is 
called a variable. The nature abhors constancy, so, the natural phenomena show 
variability, For example, plant height, number of plants per plot, eye colour (black, 
blue, green) etc. 


Let Y represents the variable and Y; (read as Y subscript i) represents the ith 
- observation. The variables, Y,, Y,,....,Y, form a set of n observations on the variable 


Y. For example, we measure the height of 5 wheat plants and observe that the height 


of first plant is 87 cm, the height of second plant is 90 cm, the height of third plan is 92 cm, 
the height of fourth plant is 89 cm and the height of fifth plant is 95 cm. 
Here, Y, =87 cm, and Y,is 95 cm. 


A variable may be fixed or mathematical when its value can be determined 
before hand i.e., amount of fertilizer to be applied to a plot, amount of insecticide 
applied to control insect pests. A variable may be random when its value cannot be 
exactly determined ie., yield from a plot, the response to an insecticide. Variables 
are usually of two types: 


i) Quantitative variable ii) Qualitative variable 


ae 


A quantitative variable is one which is capable of assuming a numerical value. 





1.3.1 Quantitative variable 


For example, height of plants, weight of grains or number of students in a class. 


Quantitative variables can further be placed into two types depending upon 
the type of measurement possible. 


i) Continuous variables ii) Discrete variables 


A continuous variable is one that can take all possible values in an interval on 
the number line. For example, atmospheric pressure, plant height, student height and 
temperature. 


A discrete variable is also known as discontinuous variable. It is one that can 
take only isolated points on the number line. Usually these values are positive 
integers as these arise from counting. For example, number of students in a class, 


number of plants per plot, number of insects in a unit area, number of grains per 
plant. 


1.3.2 Qualitative or Categorical variable 


A qualitative variable also known as categorical variable is one which is not 
capable of taking numerical measurements. An observation is made when an 
individual is allocated to one of several mutually exclusive categories. Observations 
falling in each class can only be counted. For example, sex (either male or female), 
general knowledge (poor, moderate, good), colour (blue, green, red etc.). 


Example 1.2: A sample of 5 students from a class was selected and each one of them 
was asked which brand of soap they use. Their responses were as follows: 


Lux, Rexona, Lux, Capry, Rexona 
Identify the type of variable? 


Solution: As we can only categorize these observations into different brands of soap, 
sO, the observations arise from a categorical variable. 
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Example 1.3: There are 7 sections of class XI in an intermediate college. The number 
of students in each section is as follows: 
41, 45, 47, 37, 35, 45, 42 
Identify the type of variable? 7 
Solution: This data set arises from a quantitative variable because the observations 
are numerical values. Again the variable is discrete because the observations are 
isolated points on a number line. 


1.4 Descriptive And Inferential Statistics 


Generally, rough and crude form of the data is obtained from experiments and 
surveys that needs to be organized and summarized in order to describe its sense. 


This is where the descriptive statistics comes in to help us. It provides procedures for: 
i) _ organizing the data collected from the sample, 


ii) summarizing the data. It includes graphic representation and calculation of 
summary values like measures of central value and measures of variability that 
we call statistic (mean, proportion, variance). 


iii) presenting the summaries in an understandable form for the others. 


One may be interested to generalize the results of the data. For example, based 
upon the descriptive statistics, one might be willing to estimate the value of his 
measure of a central value, had he gathered data about all the subjects possessing the 
given character rather than a sample. This procedure. of. inferring about. the 
characteristics of the population based upon the characteristics of its sample is called 
inferential statistics. 


The discipline strengthening inferential statistics is probability theory. So, our’ 
generalizations of results always involve some risks. Thus, we always make 


probabilistic statements and we really don’t prove anything. For example, we say that _ 
the probability is high that an experimental variable affected the dependent variable. 


The whole issue of descriptive and inferential statistics ‘can be described with 
the help of what we call Statistical Problem. 


The distinguishing feature of a statistical problem is that we are trying to say . 


something about a population based on a sample from the population only, taking into 
account the variability within the samples. Before we do this, we must make some 
assumptions about the manner in which the data was produced (these are embodied in 
a statistical model). Based on the model and data, statistical methods are designed 
which allow us to work back and make statements about the underlying population. 
The main aspects of the statistical problem are: 


i, a clear definition of the population of interest and objectives, 

ii, the design of experiment or the sampling procedure, 

ili. the collection, organization and analysis of data, 

iv. the selection of suitable model and the process of making statements about the 


population based on sample information. 


Figure 1.1 illustrates clearly the main aspects of statistics. Often the 
underlying population will be clearly defined, in other situations the population may 
only be hypothetical corresponding to what might have happened in an infinite series 


of repetition of the experiment concerned. 


Sample data gathered on a subject 
of the population 



















Experiment 

Population Interest is in 

some parameter 

describing the population | Inference 
About Model List of assumptions about 
population the way the data.are collected 


Figure 1.1: Statistical problem — Real world situation 
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1.5 Sources of Data 


Keeping in view the objectives of the study, data are collected by individual 
research workers or by organizations through sample surveys or éxperiments. The 
data collected may be 


i) primary data ii) secondary data 
By primary data we mean the raw data, which has just been collected from the 


source and has not gone through any kind of statistical treatment like sorting and 
tabulation. 


By secondary data we mean the data, which has already been collected by 


someone, that has undergone a statistical treatment like sorting and tabulation etc. 


1.5.1 Sources of primary data 


The sources of primary data are primary units like basic experimental units, 
individuals, households and the following methods are usually used to collect data 
from primary units. The method used depends very much on the nature of the primary 


unit. 


i: Personal Investigation: The researcher conducts the experiment or survey by himself 
and collects data from it. The data collected is generally accurate and reliable. This is 
only feasible in case of small scale laboratory, field experiments or pilot surveys and 
is not practicable for large scale experiments and surveys because it will take too 


much time. 


ii: Through Investigators: The trained investigators are employed to collect the data. 
In case of surveys, they contact the individuals and fill in the questionnaires after 
asking the required information: A questionnaire is an inquiry form having a number 
of questions designed to obtain information from the respondents. This method is 
usually employed by most of the organizations. This method gives a reasonably 


accurate information but it is very costly. 
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iii: Through questionnaire: The required information is obtained by sending 
\questionnaire to the selected individuals by mail who fill in the questionnaire and_ 


return it to the investigator. This method is cheap but non-response rate is very high as 
most of the respondents don’t bother to fill in the questionnaire and send it back. 


iv: Through local sources: The local representatives or agents are asked to send 
requisite information who provide the information based upon their own experience. 
This method is quick but gives only rough estimates. 


v: Through telephone: The information may be obtained by contacting the 
individuals on telephone. This method is quick and gives accurate information. 


vi: Through internet: With the introduction of information technology, the people 
may be contacted through internet and the individuals may be asked to provide the 
pertinent information. 


It is important to go through the primary data and locate any inconsistent 
observations before it is given a statistical treatment. 


1.5.2 Sources of Secondary Data 
The secondary data may be available from the following sources. 


i: Government organizations: Federal and Provincial Bureau of Statistics, Crop 


Reporting Service-Agriculture Department, Census and Registration Organizations 
etc. 


ii: Semi-government organizations i.e., Municipal committees, District Councils, 
commercial and financial institutions like banks etc. 


iii: Teaching and research organizations. 
iv: Research journals and newspapers. 


v: Internet : 
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Exercise | Ans on Page 250 


1.1 
1.2 


1.3 
1.4 


is 


1.6 
7 
1.8 


Define the word statistics and explain its different meanings? 


Define the following terms: 


i) Population and sample ii) Parameter and statistic 


Distinguish between qualitative and quantitative variables? 


i) 


ii) 


Write the following using a summation sign with appropriate index? 
a) Y,+Y,+...4Y¥,, b). Y¥,/+¥, +¥Y, +Y,’ 
iu (¥ y+ (Y,-n) +(Y, -) od) BY Yat... . * DY 


Expand the following summation and product signs? 


5 S101 5 

a) LY, b) d(H) o) LY? 
isl i= i=l 

d) SY, -9) e) nY, ) na¥,)? 
i=2 i= i= 
4 

g) U(X, ey, 


Classify the following as categorical, discrete or continuous variable: 


i) 
iii) 


v) 


Sex of an insect. ii) Weights of plants. 

Major crops of Pakistan. iv) Level of satisfaction. 

Teaching standards. Vi) Temperature measured in 
Fahrenheit. 


Explain in detail the main aspects of a Statistical problem? 


Define Descriptive and Inferential Statistics and differentiate between them. 


Distinguish between primary and secondary data and give different sources 
from which these are obtained. 


1.9 


1.10 


a  ——— 





Fill in the blanks: 


i) 
ii) 
iii) 
iv) 


vy) 


vi) 


vii) 


viii) 


ix) 


x) 


The purpose of the sample is to draw inference about 
Proportion is always or equal to one. 
The quantity computed from population is called 

The quantity computed from sample is called 


The quantity which does not vary from individual to individual is called 


Sum of the random errors equal to 


A variable is called when its value cannot be exactly 
determined. 
A variable that takes numerical values is called J variable. 


The procedure of inferring about the population characteristics using the 


sample is called 


First hand Gollected dain is eilied 


Against each statement write T for true and F for false statement. 


i) 
ii) 
ili) 
iv) 
vy) 
vi) 
vii) 
Vili) 
ix) 


x) 


The value calculated from the population is called parameter. 
Statistics deals with single fact. 
Statistics can be placed in relation to each other. 


Primary data and ungrouped data are same. 


A collection of observations is called data. 


A variable can assume the same value. 


_ A discrete variable can assume finite values between two given limits. 


Height measurements of students is quantitative variable. 
The word Statistics is at present used in four ways. 


A small part of the population is called sample. 


epresentation, OF 
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2.1 Introduction 


Most scientific experiments are conducted in an attempt to answer some specific 
questions and generally result in the collection of data in the form of batches of numbers, 
usually referred to as sample. To extract information from the sample, there is need to 
organize and summarize the collected data. There are many ways of describing a sample.. 
Commonly, we use either, a graph or a small set of numbers which summarize some 
properties of the sample such as its centre and spread. 


222 Classification 


The term classification is the process of arranging observations into different 
classes or categories according to some common characteristics. 


The data may be classified by one or more characteristics at a time. When data 
is classified according to one characteristic, it is called one-way classification. When 
the data is classified by two characteristics at a time, it is called two-way 
classification. Similarly, the data classified by three characteristics is called three-way 
classification. 


2.3 Tabulation 


The process of making tables or arranging data into rows and columns is 
called tabulation. Tabulation may be simple, double, triple or complex depending 
upon the number of characteristics involved. Tables are the most common form of 
documentation used by the scientists. 


2.3.1 Construction of tables 
Following are the parts of table out of which first four are main part. 


i) Title: A title is the heading at top of the table. The title should be brief and self 
explanatory. It describes the contents of the table. 
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ii) Column Captions and Box head: The headings for different columns are called 
column captions-and this part of column captions is called Box-head. The column 
captions should be brief, clear and arranged in order of importance. 


iii) Row Captions and Stub: The headings for different rows are called row captions 
and the part of the table containing row captions is called Stub. Row captions should 
be brief, clear and arranged in order of importance. 


iv) Body of the Table and Arrangements of the data: The entries in different cells 
of column and rows in a table are called body of the table. It is the main part of the 
table. The data may be arranged qualitatively, quantitatively, chronologically, 
geographically or alphabetically. 


v) Source Note: Source notes are given at the end of the table which indicate the 
compiling agency, publication, the data and page of the publication. 


vi) Spacing and ruling: To enhance the effectiveness of a table, spacing and ruling 
is used. It is also used to separate certain items in the table. Thick or double lines or 
single lines are used to separate row captions and column captions. To indicate no 


entry in a cell of the table, dots (...) or dashes(---) are used. Zeroes are not used in a 
table for this purpose. 


vii) Prefatory Notes and F ootnotes: The prefatory note is given after the title of the 
table and the footnotes are given at the bottom of the table. Both are used to explain the 
contents of the table. The footnotes are usually indicated by* * *. 


2.4 Frequency Distribution 


To extract information from a data set, first and important step is to present it ina 
compact form. A frequency distribution is a compact form of data in a table which 
displays the categories of observations according to their magnitudes and frequencies 
such that the similar or identical numerical values are grouped together. The categories 
arc also known as groups, class intervals or classes. The number of values falling in a 
particular category is called the frequency of that category. Itis usually denoted by f. 


The relative frequency, denoted by r.f of a category is the proportion of 
observed frequency to the total frequency and is obtained by dividing observed 
frequency by the total frequency. The sum of the relative frequencies should be 
one (1) except for rounding error. The relative frequencies are important for making 
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comparisons between two or more distributions. Otherwise, the different sample sizes 
of the data sets may distort comparisons. 


The frequency distribution may be made for continuous data, discrete data 


and categorical data. 


Following steps are taken into account while making a frequency distribution 


for continuous data: 


i) 


Calculate range of the data, where 


Range = maximum value in the data. — minimum value in the data 
Decide about the number of classes. The minimum number of classes may be 
determined by the formula 


Number of classes c = 1+ 3.3 log(n) (2.1) 
or c=vn (approximately) (2.2) 
where n is the total number of observations in the data. 


This gives roughly the number of classes. There are certain other 
formulae suggested to decide the number of classes. Classes are the groups of 
data values constituting the frequency table. Usually, the classes of equal 
width are defined by the numerical limits or boundaries. Each class has a 
starting point called its lower limit and its end point called its upper limit. 


The class limits are the end points of the class intervals both included 
in class interval. It is convenient to choose the end points of the class interval 
so that no observation falls on them. This can be obtained by expressing the 
end points to one more place of decimal than the observations themselves. For 
this purpose, class limits are usually converted to class boundaries to achieve 
continuity in the grouped data. This is done by expressing the upper limit of 
the first class to one more decimal place without changing the width of the 
class and starting the second class from the same value as is the end of the 
first class and so on. The upper values in the classes are included in the next 
class so that the classes do not overlap. 


The number of classes are important. Neither we should make too few 
wide classes in which most of the variation in the data is lost nor we should 


iii) 


iv) 


v) 





— as 


have too many narrow classes in which the real values in the data are hardly 
grouped. 
Decide about width of the class. It is usually abbreviated by h and is obtained 
by the following relation: 
range 
number of classes 


h= bs (approximately) (2.3) 
c 


It should be noted that always a convenient near number is chosen and 
it is not necessary to follow the rules of rounding because we are maly 
grouping the data. 


The decision about the starting point of the first class is arbitrary usually, it is 
started before the minimum value in such a way that the mid point, the average of 
lower and upper class limits of the first class is properly placed. 


Now, an observation is taken and a mark of vertical bar is made for a class 
itbelongs. Arunning tally is kept till the last observation. The tally count MW 


indicates five. 


Example 2.1: Student - Height Data 


The height (in cms) of 30 students measured at the time of registration is 


given by 


91, 89, 88, 87, 89, 91,87, 92,90, 98, 95,97, 96, 100, 101, 96, 98, 99, 98, 100, 102, 
99, 101, 105, 103, 107, 105, 106, 107, 112. 


Make a suitable frequency distribution. 


Solution: To construct a frequency distribution proceedas to follows: 


i) 


ii) 


Range = Maximum value minimum value (2.4) 
In this data maximum value is 112 and minimum value is 87. 
So, Range=112 87=25 


Approximate number of classes or class intervals are number of classes. is given 
by 


iii) 


iv) 


v) 
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= 1+ 3.3 log(30) 

= 1+ 3.3 (1.4771) 
= 5.87443 

= 6 (approximately) 


Width of the class interval (4) = range / number of classes 


Heh 
6 
= 4.167 
= 5 (approximately) 


5 is chosen for convenience, one may take 4 if he / she wishes so. 


Minimum value is 87, we start the first class from 86 with width of 

the class as 5, so, our first class is 86-90 with mid point 88, the average of 
lower and upper class limits i.e., (86+90)/2=88. Similarly, other classés are 
91-95, 96-100, .. . ., 111-115. It is clear that maximum value 112 is included 
in the last class. 


It is convenient to choose the end points of the class interval so that no 
observation falls on them. This can be obtained by expressing the end points 
to one more place of decimal than the observations themselves. Therefore, 
suitable class boundaries for this data would be 85.5 — 90.5, 90.5 — 95.5, .. ., 
110.5 — 115.5. In the class boundaries, the upper values in the classes are 
included in the next class so that the classes are mutually exclusive i.e., 90.5 is 
the upper value of the first class and is lower value of the second class. In 
counting this would be included in the second class interval. 


The class centres Y,’s are the middles of the classes. The class centres 
are also known as mid or middle points and are obtained either by averaging 
class limits or class boundaries i.e Y; is the middle of the first class 


Y, =(85.5 + 90.5)/2 = 88 
The other mid points are 93,98, ... . , 113 respectively. 


Starting from first observation, all the 30 observations are assigned to the 
classes they belong. The first observation 87 falls in the first class 86-90, a’ 
tally mark is made in the tally column against this class. The second 


observation 90 belongs to the first class 86-90, a tally mark is made in tally 
column against this class and so on, the last observation 112 belongs to the 
last class 111-115. The number of tally marks in the tally column against each 
class gives the frequency of that class. The frequency distribution is given in 
Table 2.1 





Table 2.1: — Tally count and frequency distribution for the example 2.1. 


y 





86-90 85.5-90.5 
91-95 90.5-95.5 
96-100 95.5-100.5 
101-105 100.5-105.5 


106-110 105.5-110.5 
111-115 110.5-115.5 


It is clear from the frequency table that 6 students have height between 85.5 
and 90.5 cms, 4 students have height between 90.5 and 95.5 cms and so on and I 
student has height between 110.5 and 115.5 cms. 


The relative frequency for a class can be computed by dividing its frequency 


by the total frequency. The frequency distribution with relative frequencies is given in 
Table 2.2. . 


Table 2.2: Frequency distribution with relative frequencies 


85.5-90.5 6/30 -=0.200 
90.5-95.5 4/30 =0.133 
95.5-100.5 10/30 = 0.333 


100.5-105.5 6/30 =0.200 
105.5-110.5 3/30 =0.100 
110.5-115.5 1/30 =0.033 
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It should be noted that the sum of the relative frequencies is one except for 
rounding error. 


For discrete data: In case of discrete data, each observation is a whole number. So, 
while making a frequency distribution, the possible values are written in a column 
and a tally count of each value is made for the data. The number of tally count for 
each value is its frequency. The corresponding relative frequency is obtained by 
dividing each frequency by the total number of observations. The sum of the relative 
frequencies should be | except for rounding error. 


For categorical data: In case of categorical data, the categories are placed in a 
column and a tally count is made for each category going through the data set which 
gives the frequency of each category. 


Example 2.2: The observations about the number of rotten potatoes from twenty 
equal sized samples taken from a store are available as follows: 

1, 25 Ay See Nyse Rhy bs Opes Ne ty oy Jy Uys f5 oD 

Make a frequency table 

Solution: _ The tally count and frequency table is made by going through each 
observation of the data and for each observation making a mark, vertical bar | against 
the appropriate value of the variable. In this data, the values of the variable vary from 
0 to 4. These are written in a column and a tally count is kept going through the 
whole data. The resulting frequency distribution is given in Table 2.3. : 
Table 2.3: Tally count and frequency distribution for the example 2.2. 


Number of 
rotten Tally 
potatoes 
MN 





5/20 = 0.25 
6/20 = 0.30 


4/50 = 0.20 
4/20 = 0.20 
1/20 = 0.05 


If the range of observations in the data is large, the same method is adopted as 
has been explained for the continuous data. 





Open-end classes 


In connection with the frequency 
tables, the term open-end classes is 
sometimes used. It means that in a frequency 
table, either the lower limit of the Ist class or 
the upper limit of the last class is not a fixed 
number. It may happen that both of these are 
not fixed numbers. The frequency tables with 
open end classes are formed in some practical 
situations. The frequency table about the age 





65-74 


.of people in a certain locality is given in the 
adjacent table: 
2.5 Cumulative Frequency Distribution 
A cumulative frequency distribution is a table that displays class intervals and 
the corresponding cumulative frequencies. The cumulative frequency is denoted by 
c.f and for a class interval it is obtained by adding the frequencies of all the preceding 


classes including that class. It indicates the total number of values less than or equal 
to the upper limit of that class. 


The relative frequencies, cumulative frequencies and cumulative relative 
frequencies for data for the example 2.1 are given in Table 2.4. 


Table 2.4: Cumulative distribution for the example 2.1. 


Class 
6 


85.5-90.5 6 























6/30 = 0.200 


















90.5-95.5 6+4 =10 10/30 = 0.333 
95.5-100.5 10+10=20 20/30 = 0.667 
100.5-105.5 20+6 =26 26/30 = 0.867 
105.5-110.5 26+3 =29 29/30 = 0.967 


110.5-115.5 





29+1 =30 30/30 = 1.000 

As the cumulative frequency of a class indicates the total number of values 
less than or equal to the upper limit of that class, so, the cumulative frequency of 20 
for a class 95.5 — 100.5 means that 20 values are less than 100.5 and similarly, the 


cumulative frequency of the last class 110.5 — 115.5 is 30 indicating that 30 values 
are less than 115.5. 
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If we want to compare two or more distributions, we compute relative cumulative 
frequencies or percentage cumulative frequencies because these would be comparable. 
Otherwise, the differences in sample sizes will distort comparisons. 


The cumulative relative frequencies which are the proportions of the 
cumulative frequency, denoted by c.rf are obtained by dividing the cumulative 


frequency by the total frequency. The c.r.f of a class can also be obtained by adding 
the relative frequencies of the preceding classes including that class. As cumulative 
rdative frequencies are proportions, the multiplication by 100 gives corresponding 
percentage cumulative frequencies. 


The relative cumulative frequencies are obtained by dividing the cumulative 
frequency by the total frequency i.e., for the first class interval it is 6/30 = 0.2, for the 
second class interval it is 10/30 = 0.33 and so on. The percentage cumulative 
frequency for each class can be obtained by multiplying its cumulative relative 
frequency by 100. The percentage cumulative frequency for 0.200 is (0.200) (100) = 
20. The percentage cumulative frequency for 0.0333 is (0.333) (100) = 33.3 and so 
on, the percentage cumulative frequency for 1.000 is (1.000) (100) = 100. 


2.5.1 Cumulative frequency distribution for discrete data 


The cumulative frequency distribution for the discrete data is obtained in the 
same way as for the continuous data i.e., the cumulative frequency of a class is 
obtained simply by adding the preceding frequencies including the frequency for that 
class. The relative frequencies and the cumulative frequencies for the data of example 
2.2. are given below in Table 2.5. 


Table 2.5: Cumulative frequency distribution of the example 2.2. 


Number of 
rotten 
potatoes 





24 


Example 2.4: Find out the relative frequency distribution for the following data. 
Where x denotes the number of hours worked in a day by a person in a loc‘ality of 265 
people. 
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Solution: 























Relative frequenc \ 
24/265 = 0.09 
66/265 = 0.25 
80/265 = 0.30 
48/265 = 0.18 
28/265 = 0.11 
14/265 = 0.05 
4/265 = 0.02 

1/265 = 0.00 





Example 2,5: Find out the relative cumulative frequency distribution from the 
following data of example 2.4. 













24/265 = 0.09 
90/265 = 0.25 











24 + 66=90 


90 -+ 80=170 170/265-= 0.30 
170 + 48=218 218/265 = 0.18 
218 + 28= 246 246/265 = 0.11 
246 + 14 = 260 260/265 = 0.05 
260 + 4 = 264 


264/265 = 0.02 
265/265 = 0.00 





264+ | = 265 


2.6 Graphic representation of Data 


; There are reasons for drawing graphs. The most compelling being that one 
simple graph says more than twenty pages of prose. Many graphs just represent a 
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summary of data that has been collected to support a particular theory. It is usually 
suggested that the graphic representation of the data should be looked at before 
proceeding for the format statistical analysis. 


Common uses of graphs 


i) 


ii) 


iii) 


iv) 


v) 


Graphs are useful for checking assumptions made about the data, i.e., the 
probability distribution assumed. 


The graphs provide a useful subjective impression as to what the results of the 
format analysis should be. This serves as a check on calculations and 
statistical methodology used. Always believe your common sense before 
arithmetic calculations because for some problems calculations will be 
obvious from a graph. 


Graphs often suggest the form of a statistical analysis to be carried out, 
particularly, the graph of model fitted to the data. 


Graphs give a visual representation of the data or the results of statistical 
analysis to the reader which are usually easily understandable and more 
attractive. 


Some graphs are useful for checking the variability in the observations and 
outliers can be easily detected. 


Outliers are the data values which are highly inconsistent with the main body 


of the data. These may arise because of mistakes in copying, coding or may be some 
values that are different from the rest of the data just their own. 


Important points for drawing graphs 


There are a number of points worth keeping in mind when drawing graphs. 


The most important of these are: 


i) 
ii) 


iii) 


iv) 


clearly label axis with the names of the variables and units of 
measurement. 


keep the units along each axis uniform regardless of the scales chosen for 
axis. 
keep the diagram simple. Avoid any unnecessary details. 


a clear and concise title should be chosen to make the graph meaningful 


pai) 


v) if the data on different graphs are to be measured, always use identical 
scales. 


vi) in the scatter plots, don’t join up the dots. This makes it likely that you 
will see apparent patterns in any random scatter of points. 


The general approach, which should be used to analyze the data, is as follows: 


i) construct an appropriate diagram and summary of the data and come to an 
initial impression concerning the question posed. This is known as 
exploratory data analysis. 


ii) follow this up with an appropriate formal analysis of data. 


fii) compare the results of the formal analysis with your initial impression, and 
worry if they differ greatly. 
The methods described here are appropriate for data on a single variable. 


Usually, the data will be measured on a continuous scale or at least if this is not the 


case, then the set of possible values will be reasonably large. The types of graphs 
commonly used are given below: 


2.6.1 Simple Bar Diagram 


To get an impression of the distribution of a discrete or categorical data set, it 
is usual to represent it by a bar diagram. To construct a bar diagram, the values of the 
variable or categories are taken along x-axis and a bar with height equal to its 
frequency is drawn on each category. 


Table 2.5 (a): Frequency distribution for the data 2.2. 


Number of Frequency 
rotten — Frequency 


iti 
a 


a a 


The first step is to make a 
tally count of the data to help us to 
make a frequency distribution. The 
procedure is explained on the 
example 2.2. The frequency 
distribution is given in table 2.5 (a). 






lias 
eel f 
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To construct a bar diagram, the number of rotten potatoes are taken along 
x-axis. The rotten potatoes vary from 0 to 4, so we mark the x-axis with 0,1,2,3 and 4. 
The value 0 has frequency 5, so a bar of height 5 is drawn along y-axis at point 0 on 
x-axis. Similarly a bar of height 6 is drawn along y-axis at point | on x-axis; a bar of 
height 4 is drawn along y-axis at point 2 and 3 on x-axis and finally a bar of height | 
is drawn on point 4. It is shown in figure 2.1. 
7 


6 
5 
a 4 
5 
3 3 
2 
ee? 
1 
4 1 2 3 4 
Number of rotten potatoes 


Figure 2.1: Bar diagram of rotten potatoes. 


The gaps between the bars in the bar chart emphasize the gaps between the 
values that the discrete variable can take. 


2.6.2 Multiple Bar Diagram 


It is an extension of the simple bar diagram and is used to represent two or 
more related sets of data in the form of groups of simple bars. Its main purpose is to 
compare same characteristics of a variable. 


Example 2.6: Following data is about the production of wheat in different localities 
of the Punjab for years 1987 to 1989. 












Production in Kg. (thousands) 


Locality I 500 600 200 
600 700 . 400 
800 700 500 


Locality II 
Draw an appropriate diagram for this data? 









Locality II 


5 28 


Solution: The appropriate diagram seems to be a multiple bar diagram because 3 
bars, one of each locality, for each year will make the comparison between the 
production of three localities overtime. 


To draw multiple bar diagram, the years are taken along x-axis and for each 
year three bars are drawn along y-axis, one for each locality to indicate the 
production. The bars showing production for year 1987 for three localities are put 
together, the bars for 1988 for three localities are put together and the bars for 1989 
for three localities are put together and are shown in Figure 2.2. The bars are shaded 
individually to differentiate from each other. 







1000 )Production 
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400 
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1987 1988 1989 
Figure 2.2: Multiple bar diagram of different localities. 


2.6.3 Sub-divided Bar Diagram (Component Bar Diagram) 


There are certain situations where the simple bar diagram represents the totals 
and it is possible to divide it further into different segments. For example, if the 
simple bar denotes the total population of insects caught in a field then it is possible 
to sub-divide it into male and female proportions. 


Example 2.7: There were 500 people of blood group A (kind 1). 300 of blood group 
B (kind 2) and 400 of blood group O (kind 3). After classification, it was observed 
that for kind 1 there were 200 females, for kind 2 there were 100 females and for kind 
3 there were 200 females. 


Draw an appropriate diagram for this data. 


Solution: The sub-divided bar diagram is useful in this situation to represent the 
number of males and females in each category. First construct simple bars and then 
divide it according to the number of males and females in each blood group category. 
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The simple bar and sub-divided bar diagrams are shown in Figure 2.3(a) and 
2.3 (b) respectively. 





co 


088888832 











Kind]  Kind2 Kind3 
Figure 2.23(a) Figure 2.3(b) 
Figure 2.3: Simple and sub-divided bar diagrams 
2.6.4 Pie Diagram 


The Pie chart or Pie diagram is a division of a circular region into different 
sectors. It is constructed by dividing the total angle of a circle of 360 degrees into 
different components. The angle Q for each sector is obtained by the relation: 


Component Part 
Total | 


Each sector is shaded with different colours or marks so By they look 
separate from each other. 

It is a useful way of displaying the data where division of a whole into 
component parts needs to be presented. It can also be used to compare such divisions 
at different tires. 


Q= x 360 


Example 2.8: The data are available regarding total production of urea fertilizer and 
its use on different crops. Total production of urea is 200 thousand (kg) and its 
consumption for different crops wheat, sugarcane, maize and lentils is 75,80, 30 and 
15 thousands (kg) respectively. Make an appropriate diagram to represent these data. 


Solution: The appropriate diagram seems to be a pie chart because we have to present 
a whole into 4 component parts. To construct a pie chart, we calculate the 
proportionate arc of circle, i.e., 


75S 360- E35, ® 360 = 144, ee 5 360 = 54, 15 360 = Ze 
200 200 200 200 


30 


Proportionate are of a circle (in degrees) f or different crops are in Table 2.6: 


Table 2.6: Proportionate arc of a circle for crops. 


Crops eo. canny Proportionate arc of 
(thousand the circle 


Wheat 










Sugarcane 


Maize 


Draw a circle of an appropriate radius, make the angles clockwise or anti- 


clockwise with the help of protractor or any other device i.e., for wheat make an angle 
of 135 degrees, for sugarcane an angle of 144 degrees, for maize an angle of 54 
degrees and for lentils an angle of 27 degrees and hence circular region is divided into 


4 sectors. Shade each sector with different colours or marks so that they look separate 


from each other. The pie diagram is given in figure 2.4. 


FlLentils 


Bi Maize 





Sugarcane 


Figure 2.4 
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2.6.5 Histogram 


A histogram is a useful graphic representation of data to get a visual 
impression about its distribution. It is constructed from the grouped data by taking 
class boundaries along x-axis and the corresponding frequencies along y-axis. If the 
data are in the ungrouped form, then first step is to arrange the data in the form of the 
grouped frequency distribution before making a histogram. The histogram may be 
constructed for the following two types of qualitative data. 


i) Continuous grouped data ii) Discrete grouped data 
Histogram: For Continuous grouped data 


For the continuous grouped data the frequency distribution may be with equal 
class width or with unequal class width depending upon the nature of the data. To 
draw a histogram from the continuous grouped frequency distribution, the following 
steps are taken. The first two steps are common for equal / unequal class width but 
the third step is different. 


i) Mark class boundaries of the classes along x-axis. 
ii) Mark frequencies along y-axis. 


ili) Draw a rectangle for each class such that.the height of each rectangle is 
proportional to the frequency corresponding to that class. This is the case 
when classes are of equal width as they often are. 


iv) If the classes are of unequal width, then the area instead of height of each 
rectangle is proportional to the frequency corresponding to that class and 
the height of each rectangle is obtained by dividing the frequency of the 
class by the width of that class. 

Histogram: For Equal class Interval (data of table 2.1) 
To construct a histogram, take following steps: 

i) Mark the class boundaries 85.5 — 90.5, 90.5 — 95.5, ...., 110.5 — 115.5 on 
X-axis. 


ii) Maximum frequency is 10, so label y-axis from 0 to 10. 

iii) The frequency of the Ist class is 6, so the rectangle is raised uptill 6, the 
rectangle of the second class is raised uptill 4 and so on, the last rectangle is 
raised to height 1. The histogram is shown in Figure 2.5. 


Frequencies 
10 


2 
Figure 2.5: Histogram 
of student height data. 
0 S55) 0s) 9555) 00S ley. INOS “115.5 
Class boundaries 
It may be noted that the area under a histogram can be calculated by adding up 
the areas of all the rectangles that constitute the histogram. The area of one rectangle 


is obtained by the multiplication of width of the class by the corresponding frequency 
1.é., 


Area of a single rectangle = width of the class x frequency of the class. 


The area of above histogram is 
5(6) + 5(4) + 5(10) + 5(6) + 5(3) + 5(1) = 150 


Histogram: For Unequal Class Intervals 

The principal reason for making histogram with pon class intervals is that 
the frequencies from class to class are directly comparable. However, there may be 
situations where unequal class intervals are appropriate. Firstly, in highly skewed 
distribution and secondly, the grouping of similar cases. In such Situations while 
constructing a histogram, the width of the classes should be taken into account 
because the area of each rectangle is proportional to the frequency. This can be 
achieved by adjusting the heights of the rectangles. The height of each rectangle is 
obtained by dividing the frequency of the class by the width of that class. 
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Example 2.9: The frequency distribution of ees es 
ages (in years) of 51 members of a locality is 

available adjacent table. Draw a histogram aa 
for this data? 12 


Solution: A look at this data, indicates that the width of the class intervals is not 
equal as first class has width 2; second, third, fourth classes have width 4, fifth has 6 
and the last class has width 8 so, there is need to adjust the heights of the rectangles 





i.e., for the first class we have 2 as width of class and 5 as frequency, so height of the 
first class is 5/2=2.5. Similarly, for the others 10/4=2.5, 12/4=3, 14/4=3.5, 6/6=1.0, 
4/8=0.5: These heights are also called adjusted frequencies. The width of the class 
and corresponding height of rectangles are in table 2.7 


Table 2.7: Frequency distribution of the example 2.6 with adjusted heights. 


ace Height of rectangle 
F 
(adjusted frequency) 





Taking class boundaries along x-axis and corresponding adjusted frequencies 


along y-axis, rectangles are drawn and the histogram is given in Figure 2.6. 


Adjust Frequency 
4 
3 
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1 
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Class boundaries 
Figure 2.6: Histogram for unequal class intervals. 


Histogram: discrete data 


It should be noted that bar graphs are usually drawn for discrete and 
categorical data but there are some situations where there is need to make 
approximations, the histogram may be constructed. 

To construct a histogram for discrete grouped data, following steps are taken: 
i) mark possible values along x-axis. 

il) mark frequencies along y-axis. 
iii) draw a rectangle centered on each value with equal width on each side 
possibly 0.5 to either side of the value. 

The procedure is explained for the example 2.2. 

The rotten potatoes vary from 0 to 4'so, x-axis is marked 0, 1, 2, 3, 4. The 
maximum frequency is 6 so, y-axis is marked from 0 to 6. A rectangle is drawn 


centered on each value whose height is equal to the corresponding frequency. The 
resulting diagram for the data is given in Figure 2.7. 
Frequency 
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Number of rotten potatoes 


Figure 2.7: Histogram for rotten potatoes. 


CaO Bs Representation of Data 


Advantages: The advantages of the histogram as compared to the unprocessed data 


are: 
i) it gives range of the data. 
ii) it gives location of the data. 


ili) it gives clue about the skewness of the data. 


iv) it gives information about the out of control situation. 


2.6.6 Frequency polygon and frequency curve | 


A frequency polygon is a closed geometric figure used to display a frequency 
distribution graphically. Following steps are taken to make a frequency polygon from 
a frequency distribution. 


i) Calculate mid values of the class boundaries. 
ii) Mark these mid values along x-axis. 
ili) Mark the frequencies along y-axis. 


iv) Mark corresponding frequencies against each mid point, join them and 
extend it to x-axis. 


It can also be obtained by joining the upper mid points of the rectangles of a 
histogram and extending ends to the x-axis. The distance from the x-axis to the 
plotted point corresponds to the frequency of the class. The frequency polygon : 
smoothed is called frequency curve, which is useful to have a visual impression about 
the data i.e., it may help to know about the symmetry or skewness of the data. If we 
are interested to compare two distributions number of observations less than this is 
zero. For the grouped data of the example 2.1, it is shown in figure 2.8. It is clear that 
cumulative frequency polygon is an increasing function which starts from the lower 
class boundary of the first class at zero height and ends at the upper boundary of the 
last class with height equal to total frequency. 
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12 


Figure 2.8: Frequency Polygon 
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This graph may be drawn using upper class boundaries and cumultative 
relative frequencies in which case it is called cumulative frequency function or 
polygon and can be used to locate certain values. It can be used to locate the 
quartiles or percentiles of the data. The figure 2.9 indicates the observation 
corresponding tothe c.r.f. of 0.25 
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. Upper class boundaries 
Figure 2.9: Cumulative frequency polygon of students height data. 
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We. multiply the cumulative relative frequencies by hundred to get 
corresponding percentage cumulative frequencies. The c.r.f. polygon becomes the 
percentages along y-axis instead of c.r.f. The graph on the right hand side of figure 
2.9 becomes percentage cumulative frequency polygon if we replace 0.200 by 20, the 
percentage cumulative frequency for 0.2000 as (0.200) (100) = 20: 0.333 by So.55 me 
percentage cumulative frequency for 0.333 as (0.333) (100) = 33.3 and so on 1.00 by 
100, the percentage cumulative frequency for 1. 


Consider the following steps to draw a cumulative frequency polygon for 
discrete variable. 


i) Choose horizontal axis on a graph paper and mark the data points from the 
smallest to the largest. 


ii) Mark the vertical axis from zero to total frequency. 
ili) Make a vertical jump of 20 

height equal to its frequency at ay 

the first point. Move horizontally 5 15 

from the top of this point until S 

you are exactly above the second “e 10 

data point and make a jump 3 

equal to its frequency at the s 5 

second point. Repeat this for all Ss) 


the data values. The cumulative 0 


frequency polygon for data of : : 


No. of Potatoes 
the example 2.2 is shown in 


Figure 2.10. Figure 2.10: Cumulative frequency polygon. 


It is clear from Figure 2.10 that there is a jump at each data value whose 
height is equal to its frequency. The cumulative frequency polygon is flat and 
horizontal between the data values. It starts from a height of zero on the left and goes 
to a height of total frequency at the right, being increasing function between smallest 
and largest values of the data set. 


This graph may be drawn by taking data values along x-axis and the 
corresponding cumulative relative frequencies or percentage cumulative frequencies 


me 38 


along y-axis, the relative frequencies being as heights at each point in such case it is 
called discrete cumulative frequency function or percentage cumulative frequency 
function or polygon. 


2.6.7 Scatter Plots 


Very often, many variables are measured on each individual. For example, we 
may consider two variables, height and weight of each individual in a class. Now, the 
resulting data set consists of n pairs of observation such as (x, y), i = 1, 2, ...., n> 
where each x; denotes height and each y; denotes weight. This is called a bivariate 
data set. A plot of two variables useful in such situations is scatter plot. It is obtained 
by taking one variable on x-axis and the other on y-axis. Each pair of values (xj, yi), i 
= 1, 2, ...., n; in the data set will contribute as a point in this bivariate plot and we 
usually put a cross (x) or dot (.) at the intersection of values. 


A scatter plot is the best way of studying bivariate problems. The bivariate data are 
usually of the following types: 


i) Paired measurements on the same variable 


The data come from the situations where experimental units are deliberately 
paired. For example, the use of twins in the biological and psychological experiments. 
Here we would expect results within a pair of twins to be more alike than 
observations between different pairs. In such situations, the main interest is to 


investigate whether variables are dependent and if so what form of the relationship 
between the variables actually is. 


Example 2.10: Data are recorded on milk yield of cows in the morning and in the 
evening. 


Morning values: 4.5, 6.0, 5.5, 3.5, 4.5, 6.5, 7.0, 5.0, 4.5, 6.5 
Evening values: 5-55,0.950:0,°5.5,-7.0, 5.5. 8.0, 6.0, 3:5, 7.0 


The interest is whether the characteristic measured varies in any systematic 
pattern over the day. 


Solution: As both the measurements are on same variable, the interest is therefore, 
not just in relationship between morning and evening measurements but also in 
comparing them. The line of equality is a useful visual aid for this type of data. For 
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the scatter plot we take the morning values along x-axis and evening values along 
y-axis. 

At the intersection of 4.5 and 5.5, we put a dot (.), similarly for 6.0, 6.5 and so 
on. The scatter plot is shown in Figure 2.11. 


Evening values 





Ze 3.0 4.0 5.0 6.0 7.0 8.0 


Morning values 
Figure: 2.11 Scatter plot of morning and evening values 


Line of equality is at an angle of 45° as indicated in Figure 2.11 alongwith 
scatter plot. It is clear that most of the points are above the line of equality and so the 
milk yield in the morning and evening is not the same and more milk is obtained in 
the evening as compared with the morning. 

With this data we can also look at the differences between morning and 


evening values and treat this as a one sample problem. However, we would no longer 
be able to see if the change was related to the initial value. 


ii) Two related measurements 


The pair of values may come from two variables which are related to each 
other. For example, samples of soil nitrogen and yield of a variety are taken in each of 
seven randomly selected agriculture locations. In such situations, it doesn’t make any 
sense to compare them as both the measurements are not on the same variable. 


Scatter plots are also drawn to examine the relationship between two related 


measurements. 
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Example 2.11: Samples of soil nitrogen and yield are taken in each of seven 
randomly selected agriculture locations. The soil nitrogen and yield are: 

[ Mocation | 1 [2 [37415] 6] 7_ 
70 [65 | 60 | 70 | 80 | 73 | 
| yield (ke) | 9.0 | 7.5 | 6.0°]'45 [76.0 [7.0 | 6.0 | 


Here, the question arises whether the soil nitrogen and yield are related? 







Solution: It makes no sense to compare them as both the measurements are not on the 
same variable. 


For scatter diagram, we take soil nitrogen along x-axis and yield along 
y-axis. The scatter plot is in figure 2.12. 


The scatter plot indicates that as the soil nitrogen increases, yield also 
increases. 
10.0 
9.0 
8.0 
7.0 
6.0 


Yield 
S 





5.0 5.5 6.0 6.5 7.0 rare i 8.5 8.0 
Soil nitrogen 
Figure 2.12: Scatter plot of soil nitrogen and yield. 


2.7 Bivariate Frequency Distribution 


The constructed frequency distribution, considering two variables at a time is 
called bivariate frequency distribution. The pairs of observations are taken into 
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account while constructing a bivariate frequency distribution. The procedure to 
construct a bivariate frequency is the same except n pair of values (x; y,), i= 1, 2, ....,n are 
allocated at the intersection of classes of both the variables. 
Example 2.12: A sample of 25 students was taken and their heights in feet (x) and 
weights in kilograms (y) were measured. The pairs below are (height, weight). 
(5.5, 60), (5.0, 55), (4.3, 46), (5.3, 67), (4.9, 48), (5.9, 69), 
(5.4, 67), | (4.8, 55), (32,57), (5.8, 67), (53,573: (5.7, 65), 
(5.8, 63), (5.9, 65), (4.8, 49), (613, 55), (5.1, 60), (5.7, 65), 
(4.7, 50), (4.5, 50), (5.3, 60), (4.6, 53), (5.4, 62), (5.2, 59); 
(4.7, 55). 


Solution: The minimum and maximum height is 4.3 and 5.9 feet respectively. The 
minimum and maximum weight is 46 and 69 kilogram respectively. 


For height, we take 4 classes with an interval of 0.5. So, the class limits for 
height are 4.0-4.4, 4.5-4.9, 5.0-5.4, 5.5-5.9. The corresponding class boundaries are 
3.95-4.45, 4.45-4.95, 4.95-5.45, 5.45-5.95. As usual, the class boundaries have been 
obtained by averaging the upper class limit of one class and the lower class limit of 
the next class i.e., the first class boundary is 


(4.4+4.5)/2 = 4.45 and so on. 


For weight, we take 5 classes with an interval of 5.0. So, the class limits are 
44-49, 50-54, 55-59, 60-64, 65-69. The corresponding class boundaries are 44.5-49.5, 
49.5-54.5, 54.5-59.5, 59.5-64.5, 64.5-69.5. 


Starting from first pair, all the 25 pairs are assigned to the classes they belong. 
The first pair (5.5,60) falls in the class with height (5.45-5.95) and weight (59.5-64.5), 
a tally mark is made in the table against their interaction. The second pair (5.0, 55) 
belongs to the class with height (4.95-5.45) and weight (54.5-59.5), a tally mark is 
made in table against their interaction and so on, the last pair (4.7, 55) belongs to the 
class with the height (4.45-4.95) and weight (54.5-59.5). The number of tally marks 
in each cell gives the frequency of the class with certain height and weight boundaries. 
The bivariate frequency distribution is given in Table 2.7. (a) and 2.7 (b). 


ce eS 





Table 2.8 (a): Bivariate frequency table 
Height 
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Table 2.8 (a) is the bivariate frequency distribution (or table) after making a 
tally count. ~ } 


Table 2.8 (a) 
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59.5-64.5 


It is clear from the bivariate frequency table that there is one individual with 
weight between 44.5-49.5 kg and height between 3.95-4.45 feet. There are two 
individuals with weight between 44.5-49.5 kg and height between 4.45-4.95 feet and 


so on, there are 5 individuals with weight between 64.5-69.5 kg and height between 
5.45-5.95 feet. 
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Exercise 2 Ans on Page 250 


2.1 
2.2 
. 2S 


2.4 


2.5 


2.6 







What are different methods of representation of statistical data. 
Define the Histogram, the frequency polygon and the frequency curve. 


What do you understand by classification and tabulation? Discuss their 
importance in a statistical analysis. 


Distinguish between one-way and two-way tables. Illustrate your answers 
with examples. Also explain the following: 


i) Classification according to attributes. 

ii) Class limits. 

ili) Length of class interval. 

iv) Class frequency. 

The following table gives the details of monthly budgets of two families. 
Represent these figures through a suitable diagram. 


Family A Family B 






eotking [Rs 100 100 
| 
fc aE NE Ra 


Represent the following data through pie diagram. 


aes A te 
2000 
















Fuel and light 
Miceboes P| 


; 


2.7. Define frequency Histogram. Draw a Histogram for the following frequency 
distribution giving the steps involved. 





2.8 i. a) 
b) 






Write down the important points for drawing graphs? 

In order to estimate the mean length of leaves from a certain tree, a 
sample of 100 leaves was chosen and their lengths are measured in 
millimeter. A grouped frequency table was set up and the results were 
as follows: 


fFrequeney [3 [5] 8|[e[wl[m| wm, s |? | 






ii. a) Display the table in the form of a frequency polygon. 
b) What are the boundaries of the interval whose mid point is 3.7 cm? 
2.9 Ina locality, total area is 500 acres where 250 acres are under sugarcane, 125 
acres are under maize, 60 acres are under wheat and the remaining 65 acres 
are under other crops. Make a pie-diagram to represent the distribution of 


acreage under different crops. 


2.10 A biologist was interested to know whether male spiders are longer or female 
spiders. He collected random samples of female and male green lynx spiders 


given below. Advise him? 


Length of female (in mm) 
Aes 2 ae a 
2 Sa, age x! em” EF 

y Seren: FS Se Ne 
C2. Dae? wee, 58 
48 59 G2: 638 
mS eM i 


Length of male (in mm) 
Bde 9.9. 1. 5.0 8.4 
26° 70° ~-6G=- 73 
da O83. 6.3 8.3 
8.0 9.1 6.3 8.4 
Schnee. TS 2iIbe2 


2.11 i) What is meant by tabulation? Explain the main steps which are generally 


taken in tabulation? 


ii) What is a frequency distribution? How is it constructed? 
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2.12 The following data gives the lifetime in minutes, recorded to the nearest tenth 

of a minute of 50 sprayed insects. 
122 0.7 39--Li 29. 14. Beimawocwss 
2oyae0.9 3.4 28. 3.7 yao O04 2.8 daa 
3.9 63 25-2 2S 2 Oo. Aa ae 12k 
3.5 29 V2 eo eat ee Cakes aes a 
MOr liegt ol) Le ew. See 2 
Using 8 intervals with the lowest starting at 0.1 

i) Forma frequency distribution and a cumulative frequency distribution. 

ii) Also draw Histogram and frequency polygon for the frequency distribution so 


formed. 
2.13° 1) What are the advantages of diagrammatic representation? 
ii) Explain the following: 
a) A Bar diagram b) Subdivided bar diagram 


c) Multiple bar diagram. 
2.14 The following data gives the record of a company’s savings over the years. 
Draw a bar diagram to represent it. 





2.15 Draw a sub-divided bar diagram to represent the male and female population 
of four divisions of Punjab in 1961. 







Division | Male | Female | Both Sexe 
Lahore 





Er PERM EE, Pt <a 
Sargoha|__ 32 Pw 
Rawalping 


2.16 The following information is available about the age (in years) and their 
weights in kilograms (age, weight). Make a scatter plot and bivarite 
frequency distribution. 
(20, 60), (15, 55), (14, 45), (17,60), (16, 48), (22, 70), (16, 63), 


(14, 55), (18, 57), (19, 67), (21, 67), (17, 65), (13, 60), (15, 60), 
(17, 49), (19, 65), (23, 73), (21, 65), (22, 70), (14, 50), (16, 60), 
(1S, B39), CUS O29. CUE, SO} a 73): 


2.17 _ Fill in the blanks: 


2.18 


i) 


ii) 


ili) 


iv) 


v) 


vi) 


Vii) 


Viii) 


ix) 


Classification is the of arranging data according to some 
common characteristics. 


A table has at least parts. 

In an open-end frequency distribution, either the class limit 
of group or upper limit of the class are not 
given. 


In Histogram, with unequal class intervals, the area of each rectangle is 
to class frequency. 


An ogive is a polygon. 
A frequency table can be represented graphically by a 


A Histogram is a bar Chart with space 
between its bars. 





The area of each bar is to the frequency it represents. 


If mid-points of the tops of the consecutive bars in a Histogram are 
joined by straight lines, a is obtained. 


Against each statement write T for true and F for false statement. 


i) 
ii) 
iii) 
iv) 
v) 
vi) 
Vii) 
Vili) 
ix) 


x) 


Grouped data and primary data are same. 

The class mark is also named as mid point. 

A table has at least three parts. 

The graph of a time series is called Histogram. 
Cumulative frequencies are decreasing. 

The data 10, 5, 7, 6, 4 is the example of grouped data. 
The two fold division is also named as Dichotomy. 

Data can be presented by means of graph. 

A graph of cumulative frequency curve is called polygon. 


In constructing a Histogram, midpoints are to be taken along x-axis. 


Measures Of Location: 
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3.1 Introduction 


The diagrammatic representation of a set of data can give us some impressions 
about its distribution. Even then, there remains a need for a single quantitative 
measure which could be used to indicate the centre of the distribution. The measures 
commonly used for this purpose are mean, median and mode. Geometric mean and 
harmonic mean are also sometimes used. These measures are single values, which 
represent the given data and are known as averages or measures of location or 
‘measures of central tendency. The name measures of location arises as these 
measures give an indication where to locate a distribution; The decision as to which 
measure is to be used depends upon the particular situation under consideration. 


Properties of a good average are: - 


i) It is well defined. ii) It is easy to calculate. 
iii) It is easy to understand. iv) It is based on all the values. 
v) It is capable of mathematical treatment. 


The important types of averages are: 


i) Arithmetic mean and ii) Geometric mean 
Weighted mean. 

iii) Harmonic mean iv) Median 

v) Mode 


3.2 Arithmetic Mean And Weighted Mean 


Arithmetic Mean is calculated by adding up all the observations and dividing 
the sum by the total number of observations. The Greek letter 1 (meu) is used as a 
symbol for the population mean. . 
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The population mean of N observations Y,,Y,....,.¥y is defined as: 


Mm —(Y +t Jot) 


1 x 
“Wh 
vy 

7 


The p is a parameter, a fixed value and is usually unknown in practice. It is 
also called as arithmetic mean, abbreviated as A.M. The estimate of population mean 


LU is sample mean and is denoted by Y. Y isa statistic and its value varies from 


sample to sample drawn from the population. The sample mean Y of n observations 
Y,> Y>>---5¥, from a population is defined as: 


(3.1) 


1 
pene) Ops mals SSG 
n 


i 
| 
M 
< 


= on G2) 


The sample mean is a good estimate of the population mean as it is an 
unbiased statistic. Unbiased simply means that if means are calculated for all 
possible samples drawn from the population, the mean of these sample means would 
be equal to the population mean. The arithmetic mean has the same units as the 
original observation i.e., if the original observations are in centimeters, the unit of 
mean would be in centimeters. 


Example 3.1: Arithmetic mean for ungrouped data. 
Following are the data on students heights (cms). 
87, 91, 89, 88, 89, 91, 87, 92, 90, 98. 
There are ten observations and their sum is 902. 


_¥; | 89 [91 |89 | 88 [89 | 91 [87 [92 [90 [98 | 902 _| 


Arithmetic mean (Y) a Lyi 
n 
902 


pis 
= 90.20 cm. 
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3.2.1 Updating or correcting the mean 


It may happen that an observation was overlooked while calculating the mean. 
In such situations, the observation may be incorporated. 


Suppose that in the above example 3.1, it was found later on that the last 
observation is 94 instead of 98. To correct the mean, we proceed as follows: 


The mean of n observation is given by: 


Y= Dy ty ae 
n 
= 10(90.20) = 902 


The corrected total can be obtained by subtracting 98 and adding 94 from the total 
902 so the corrected sum is given by 


Corrected 2 Y = 902 — 98 + 94 =898 
So corrected mean is given by 


y= 
n 
= 898/10 
= 89.8 
A.M for grouped data 


When data is lengthy, it is usually grouped into different classes and Y. is 
calculated by the following formula: 
a Si Vi t+ faYo te t+ Ay 
nie oe ter 


Y 








(3.3) 


where k is the number of classes. y. is the mid point of the ith class and f, is the 


corresponding frequency. Here, it is assumed that each of the observations is equal 
to the mid point of the class in which it occurs. This causes the value of arithmetic 
mean a bit different for grouped data when the same is calculated from ungrouped 
data. This difference is called grouping error. 
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Example 3.2: Find arithmetic mean consider the grouped student height data as in 
the table 2.1. 


Solution: The first two columns indicate frequency distribution and the columns 3 
and 4 are useful to calculate mean according to the definition. 














Te ei AG MrT 
|Mid pointy; fy; | 
86-90 88 528 
91-95 93 oy 
96-100 98 980 
101-105 103 618 
106-110 108 324 
111-115 P13 113 
--- 2935 
> fivi = 2935 
Da) * 30 
y =Arithmetic mean = LAI: 
Df 
“208d 
: 30 
= 97.8333em 
3.2.2 Properties of arithmetic mean 
i) The algebraic sum of the deviations of the observations from their mean is zero. 
1.e., Sake 
X(y, -y)=0 AK. (3.4) 


It can be proved as: _ 


SY) SF ony 


i=1 i=] 


mi Sarwank 
=P yn 2d ie. 


= 0  .. ¥ is constant for any specific values of Y, 
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for grouped data 


» Shh DAY, 
pa (¥-Y) => fy, -2-——_— ie ene 
i=l n - Se 
=¥ hy-d KY; 
=, 0 


Numerically, it can be easily seen from the following sample 


, Total 
yj | 9) | 4 | 5 | 6 | yi | | 30 
yj-y 3 ma 0 i 2 0 
y (2+4+5+6+7+6) 
a 6 
_ 20 =5 
6 


We see that zy, ~y)=0 ' 
ii) Sometimes, it is desirable to calculate the combined mean of two or more 
sample means using the individual sample means and their sample sizes. It 


would be denoted by Yc. The combined mean is calculated as: 


‘sa i, Yael eee 
oe) See k tk (3.5) 
n+ ny t..+n, 


SI 


n, 


t 


Mr 


i] 
-— 


Nn. 


I 


Me 


1 


T 


e.g., the combined mean for two groups is given by 


= nYitn ie 
Yc ue 1 22 


If y = 3 with n; = 3 and r = 4 with n,=2, then Yc is given by 


i= 3(3)+2(4) 


iC aaa 
17 
5 
= 3.40 


Similarly, Y- can be calculated for more than two groups. 


(iii) | The sum of squares of the deviations of the observations from their mean 
is minimum i.e., 


yah (Y — Y) is minimum (3.6) 
i=] 


for ungrouped data 
> 40%, - Y)’is minimum for frequency distribution. 


It means that when we take the sum of squares of the deviations from any 
value a other than Y , then 


LG ee gary 
i=l i=l 
It can be proved as: 


Sra er ie es Ya)" 
-Y[v-H+e-a] 


=: (Y, — Y)? + n(Y - ay’ +2% -a)> (Y, - Y) 


=> = FY 4 a a? 40 as Ret ip Y)=0 


Now 2 (Y; = a)’ is the sum of two terms which are both positive, i.e., 
i=] 


> &%- Y)andn(y - a)’ are positive being squares. So, >» (% - a)’ 
i=] i=l 
greater than single term)” (Y,- Y) ie, 

i=1 


Y G%- a> w- Hy 


i=l 
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iv) The mean is affected by the change of origin. If a constant A is added to each 
of the observations Y7, Y>, ....., Yn having mean Y , then the mean increases by 
that constant. By adding a to all the observations wehaveat Y,,a+Y,,...,a+Y, 
and the mean would be a + Y. For a = 10, mean would be 10 + Y, and 


similarly, if a is subtracted from each Y; then the mean is Y —a. 


Vv) The man is affected by the change of scale. If Y1, Y2, ......¥n have mean Y 
then the mean after multiplying each observation by a constant a, is the mean 
multiplied by that constant. The mean of aYj, aY2, ...,aYn. i.e., 


y= OF Oe ers oer. 
n F 
aUa boot nee! 


n 
be 
i=l 


n 
= ay 


a a 





If Y7, Y2, .....,.¥n are multiplied by 10 then the resulting mean would be 10 Y. 
3.2.3 Calculation of A.M. by coding / short-cut method 


The arithmetic mean may be calculated by the following formula:. 


» 2, 
Yoo =: eat ee (3.7) 
n 


Where D; = (Y; — a) and a is an arbitrary value called provisional mean. The 
relation (3.7) can be derived as follows:- 


un 
7=1 


n 





n 


aL (Y -—a+a) 


i=l 


n 


n 


> (Y-a) * D 


i=l i=l 


=. a+——-= at 
n n 





where, Y,-a=D, 
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This is for ungrouped data, and similarly, for grouped data 





k 

WP SMA 

Y =a+ a (3.8) 
me 

If the class intervals are equal then the arithmetic mean may also be calculated as: 

k 

a a ll 

Y=a+ xh (3.9) 
aie 
i=l 

Y,-a ? 
where u;.= — h and h=common width of the classes. 


Example 3.3: Find A.M. for the data of the examples 3.1 and 3.2 by short cut 
method. ; 
Solution: Taking 90 as an arbitrary origin. 


89 88, 89 91 87 92 90 98 


ee Os Peete eo ee 





= 90.20 cm 
Arithmetic mean for grouped data of the example 3.2 taking a = 98 


[eine we oh Mls Leo LO 
eo 2 ee ote a ee 
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=98+ ne 
30 


= 98 — 0.1667 
= 97.8333 cm 


3.2.4 Merits of arithmetic mean 


i) It is rigidly defined by mathematical formula. 

ii) It is easy to calculate. 

iii) It is easy to understand. 

iv) It is based upon all the values. 

Vv) It is stable statistic in repeated sampling experiments. 

Vi) Sum of the observations can be found if mean and number of 
observations are known. 


3.2.5 Demerits of arithmetic mean 


i) It is very sensitive to any marked departure from the bell shaped 
distribution and hence is not suitable for skewed distributions. 
ii) It gives fallacious and misleading conclusions when there is too 


much variation in data. 

iii) It is greatly affected by extreme values. 

iv) It can not be calculated for open-end classes without assuming 
open ends. 


3.2.6 Weighted Mean 


Arithmetic mean is used when all the observations are given equal importance 
but there are certain situations in which the different observations get different 
weights. In these situations, weighted mean denoted by Yw is preferred. The weighted 


mean of Y,, Y,,....,¥, with corresponding weights w,,w,,....,W,, is calculated as: 


n 
WE MET, Fat WY, 
Ww, +, +....,-W 


Y,= 


wy. 
2 ae a Lwy, 
sw x, 





(3.10) 





56: 





Example 3.4: The following data is about the percentage kill (Y;) and the number of . 
insects (w,) used in a study, the interest is to calculate the mean of the percentage 
kill. 

88 85.7 52.1. 33.3, .8c0 

ie aR Re Sai 








Lwy, 
Solution: Weighted Mean = Yy = 
bad 
Now Xw; Y; = 44(88) + 42 (85.7) + 24 (52.1) + 16 (33.3) + 6 (12) 
= 9326.6 
and Lw,;=44+42+ 24+ 16+6 
Ws 
therefore rs ss 
132 
= 70.65606% 


3.3 Geometric Mean 


Geometric mean is useful measures of central tendency for positive values. It 
is appropriate for averaging rates and ratios. It may be appropriately calculated only 
for ratio scale data. 

The geometric mean is defined as the nth root of the product of n positive 
numbers. If we have n positive values Y;, Y2, ...., Y, then geometric mean, denoted 
by G.M is defined by 7 


G.M= 4/¥ xY,....x¥, peal) 
FEV TTA Ya” 
Taking log of both sides, we get. 


Log G.M = ~ (log Y, + log Y, + .... + log Y,) 


= *Sogy, 


Nn i=l 


: ly 
or G.M = antilog [Soe | (3.12) 
n i=\ 
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This measure is useful when dealing with relative values such as to find the 
average of percentage changes, independent ratios and index numbers. The formula 
for grouped data is: 





G.M= .| (¥,)"(%,)7+.... + (%)" (3.13) 
Where n= vf 
i=] 
I 
G.M.= [(¥,)" x(¥,)? x... (Y, )/]” 
Taking log, we get 
i 
log (G.M) = ¥ [ flog (Y,) + f, log (Y2) +.... + f, log (Y,)] 
1 
= — Df log(y,) 
or G.M =antilog (- 3 y, loa) (3.14) 
A is 


Example 3.5: Calculate geometric mean for the following ungrouped data of the 
percentage changes in the weight of eight animals. 
45, 30, 35, 40, 44, 32, 42, 37 
Solution: we know that 


G.M = antilog 2S oer, 
Nn jz 


Log (G.M) =+¥ logy, 
n 


i=l 
= FL log45 + log30 + log35 + log40 + log44+log32 
+ log 42 + log 37]. 
= 5 1.6532 + 1.4771 + 1.5441 + 1.6021 + 1.6434+ 1.5051 


+ 1.6232 + 1.5682]. 
12.61 


8 
= 1.57705 
= G.M =antilog (1.57705) 
= 37.7616 





se 3 


Example 3.6: Compute G.M by using the basic definition for the observations: 
0.5, 10.0, 2.7, 3.48, 4.7 


Solution: Geometric mean of five observations is given by 
1 


GM. =(¥ix Yox¥3x¥sx¥s)5 
1 
= [(0.5) (1.0) (2.7) (3.48) (4.71) 5 


= 2.9430 


Example 3.7: The grouped data available on insect growth population for age and 
corresponding frequencies are thai 


| Cheer pondaries fe tanh tn 





Eamopa Nae ae wuon| Tanna ae aa 


Find geometric mean for the above data 
k 
Solution: .G.M = antilog E > f, logy, } 
Nn i=i 


Where n = > f. 


To compute G.M. we calculate column 3,4 & Sof table 
df, logY, = 36.2334 


Sidhe 34 





thus G.M. = antilog Pees 
34 

= antilog (1.0657) 

= 11.6329 


Example 3.8: A man gets a rise of 10% in salary at the end of his first year of service 
and further rise of 20% and 25% at the end of the second and third years respectively. 
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The rise in each case being calculated on his salary at the beginning of the year. To 
what annual percentage increase is this equivalent? 


Solution: Suppose the initial salary of the man = 100 
Increase after first year = 10% 
Salary at the end of the year = 100 + 10 = 110 
Salary at the end of the second year = 100 + 20 = 120 
Salary at the end of the third year = 100+ 25 =125 


1 
G.M = (110 x 120 x 125)3 = 118.16 
Annual percentage increase = 118.16 — 100 = 18.16% 


Example 3.9: The frequency distribution given below has been derived from the use 
of working origin. If D = Y—-18, find Arithmetic Mean and Geometric Mean. 


are eo ee ee 
cs fin, ets Qa CAIUS eh AB IAD cov IB vei’ 


Solution: Here, D = Y— 18 or Y= D+ 18 






0.776815 
1.00000 
1.14613 
P25527 
1.34242 
1.41497 
1.47712 ; 
1.53148 | 6. aint 


Sc ae 1696 | --_| 104.19097 | 


Arithmetic Mean = Y = as” 
LS 
_1696 
ESD 
—08 Be 


andG.M = Antilog rae 


















 a_s 





—pAniilae areas 


80 
=20.06 
3.3.1 Properties of geometric mean 
i) If there are k sets, each with observations n,, n,, .... ,n, and G,, G,, G, 


as their geometric means. Then the combined geometric mean Geo, of 
total observations is given by 


k 
dn; logG, 
Geom = a (3.15) 
an, 
i=l 
ii) If there are two sets each consisting of n positive observations Y,,Y,,.....Y, 


with geometric mean G; and X,, X,.....,X,,,with geometric mean Gp, 


then the geometric mean G of the ratio Z=Y/X is the ratio of their 
geometric means i.e., 


Geant (3.16) 





x, x, x, 

1 

xi Y. Ye 
= ae eS 
x, x, n 
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3.3.2 Merits of geometric mean 
i) It is rigidly defined by a mathematical formula. 
ii) It is based on all the observations. 
iil) It is capable of mathematical development. 
iv) It is less affected by the extreme values as compared with the mean. 


3.3.3 Demerits of geometric mean 
i) It becomes zero if any of the observations is zero. 
ii) It is sensible only for positive values and it becomes imaginary for 
negative values. 
3.4 Harmonic mean 
The harmonic mean is particularly useful when dealing with the averages of 
certain types of rates and ratios. The harmonic mean of n values Y,,Y,,...Y, is defined 


as reciprocal of the arithmetic mean of the reciprocals of the values. The harmonic 
mean is denoted by H.M and is defined by: 








H.M = Reciprocal of eee (3.17) 
nial. 
Where Y, +0 

2 % (3.18) 

hy 
Harmonic mean for the grouped data is given by 
k 

LS, 

HM = = (3.19) 
sii 
i=l Z 


Harmonic mean deals with the rates independent on each other. 


Example 3.10: A tractor is running at the rate of 10 Km/hr. during the first 60 Km; 
at 20 Km/ hr. during second 60 Km; 30 Km/hr. during the third 60 Km; 40 Kmv/ hr. 
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during the fourth 60 Km and 50 Kni/hr. during the (last) fifth 60 Km. What would be 
the average speed? 
Solution: Harmonic mean of the values shall give the average speed. 


H.M = Reciprocal of peal 


nia, 
| re | i 1 1 
— +— +— +— +§ — 
= Reciprocal of ae ea 


dink eae ces Ne 
fe ick 1 1 ] 
—+—+—-+—+— 
10 20 30 40 50 
5 
0.22833 
= 21.89813 Km/hr 





Example 3.11: Find Harmonic mean for grouped data of the Example 3.7 _ 


Solution: We know that H.M = 


The data is given by 





a 
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Xf, =34 
f; 

zy = 3.7137 
ow 34 
S37 
= 9.1553 

Example 3.12: Calculate Harmonic Mean and Geometric Mean from the following 
data: 
3.8, S50. 18 


Solution: It is not possible to calculate Geometric Mean and Harmonic Mean as the 
data involves the value 0 because while calculating Geometric Mean, the 
multiplication with 0 makes product of the given values zero (0) and while 
calculating Harmonic Mean, division by zero (0) is undefined. 

3.4.1 Properties of Harmonic Mean 

i) If there are k sets each with observations n,, n,,....,.n, and KK yssk, as 


their harmonic means. Then the combined harmonic mean H.Meomp of all 
the observations is given by: 
k 


> Nn; 
H.Meomb = ces (3.20) 
i=] H, 
3.4.2 Merits of harmonic mean 
i) It is defined by a mathematical formula. 
ii) It is based on all the observations. 


ili) It is capable of future mathematical development. 
3.4.3  Demerits of Harmonic Mean 
i) It can not be calculated if any of the observations is zero. 
ii) It is not simple to calculate as compared to the arithmetic mean. 
iii) It gives less weight to large values and more weight to small values. 
3.4.4 General relationship between A.M. , G.M. and H.M. 
If Y,,¥,,....,¥, are nm positive observations, then the arithmetic mean, 
geometric mean and harmonic mean satisfy the following relation. 
A.M 2G.M.2H.M 
The three means are equal only if all the observations are identical. 
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3.5 Median 


It is the value which divides an arranged data in order of magnitude into two 
+1 
equal parts. In case of odd number of observations, median is the value of | th 


item and in case of even number of observations, median is the mean value of (n/2) th 





and * - : th items of a set of values arranged in ascending or descending order of 
magnitude i.e., defined as the middle value if the number of values is odd and the 
mean of the two middle values if the number of values is even. 


Example 3.13: Following are the heights (cms) of 5 students measured at the time of 
registration. Also find median for the data of the example 3.1. 


Y;: 88.03, 94.50, 94.90, 95.05, 84.60 
Solution: The ordered observations are: 
84.60, 88.03, 94.50, 94.90, 95.05 


Here n = 5, so 
2 5+1 ; 
Median = value of can observation 


= value of 3 observation 
= 94.50 


or Median = Ato 


2 
£2) 
Y(3), the third value in the ordered observations. 


94.50 cm 


The data of the example 3.1 is used to calculate median for even n. The 
ordered observations are: 


87, 87, 88, 89, 89, 90, 91, 91, 92, 98. 
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th 
or Median = value of a obs. 


th 
= value of ig] : ) obs. 


= value of (5.5) obs. 





= 5" obs + 0.5 (6" obs —5" obs) 
= 89 + 0.5 (90 — 89) 
= 89 + 0.5 (1) =89.5cm 


main = Gb dy 


giaiheg gaint) 


2 


2 
So, median is the mean of ¥ (5) and Y (6) 
89+90 


des = Voy = 70 


Median = = 89.5cm 





For the grouped data (given in ascending order) median is calculated by the 
relation: 
Median = / + 4s =<] (2.21) 
f\2 


Where / is the lower class boundary of the class containing the median. 
h is width of the class containing median. 


f is the frequency of the class containing median. 


” js used to locate the median class ie., where the Ga observation falls 
and this is done by looking at the class corresponding to the cumulative 


frequency in which (5 » observation lies. 


c is the cumulative frequency of the class preceding to the median class. 


Example 3.14: Find the median for the following student height grouped data. 


Class boundaries 


85.5 — 90.5 
90.5 — 95.5 
95.5 — 100.5 
100.5 — 105.5 
105.5 — 110.5 
110.5 — 115.5 










Solution: To find median class E \" observation falls is S th observation. 


Sh 3 30 ‘ 
ie % th observation = 5 th observation 


= 15 th observation 
The 15th observation falls in the class 95.5 — 100.5 
So, Median group = 95.5 — 100.5 


Seiden ls ee 
fra 


= 95.5 + aa (15 — 10) 
10 


=98.0 cm 


In case, when data is discrete but grouped, the median is calculated by using 
the formal definition of median. 


Example 3.15: Discrete grouped data of 26 plants of cotton are taken and the number 
of bolls per plant observed, the data is grouped as follows: 


‘Number ofbolis [0 [1 [> [4 [5 [6 [7] 
‘Number of plants |S 6 [3 [6.|3 [2 [1 | 


Solution: As n is even so the median is the mean of ¥(4) and ¥2 +1) obs. 






¥(4)= Yas); 4441) = Yaa) 


To locate Y¥(;3) and Y.4) we need to make cumulative frequency column: 
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Number | Number of 
of bolls plants 
0 5 









1 
2 
4 
5 
6 
#/ 





From the c.f column it is clear that Y,.s) and Y.4, are the plants which have 
number of bolls equal to 2. Thus 2 is the median. 


3.5.1 Properties of median 
i) If a constant a is added to each of the n observations Y;, Y>, ...., Yx having 
median M, then the median of a + Yi, a + Y>, ...., a+ Y, would be a + M. 
If a is multiplied to each of the n observations, then median of ay, 
aY>,...., AY, would be aM. 


ii) The sum of absolute deviations of the observations from their median is 
minimum i.e., 
> lY- median |is minimum (3.22) 


The bars indicate absolute value meaning thereby we are ‘taking all the 
deviations positive. 
iii) For a symmetrical distribution median is equidistant from the first and 
third quartiles i.e., 
Q3 — Median = Median — Q; (3.23) 
Where, Q; and Q; are first and third quartiles respectively. 
3.5.2 Merits of median 
i) It is quick to find. 
ii) It is not much affected by exceptionally large or small values in the data. 
iii) It is suitable for skewed distributions. 
3.5.3 Demerits of median 
i) It is not rigidly defined. 
ii) It is not readily suitable for algebraic development. 
iii) It is less stable in repeated sampling experiment than the mean. 
iv) It is not based on all the observations. 
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3.6 Quantiles 


Sometimes, our interest is to know the position of an observation relative to the 
others in a data set. For example, in the grouped student height data of the example 2.1, 
we may be interested to know the percentage of students having height less than some 
specified value. The measures used for this purpose are called quantiles or fractiles. 
These are usually calculated under the following headings: 


i) Quartiles and Deciles ii) Percentiles 
3.6.1 Quartiles 

Quartiles are the values in the order statistic that divide the data into four 
equal parts. These are the first quartile Q;, second quartile Q) (median) and third 
quartile Q3. The first quartile, also known as lower quartile, is the value of order 
statistic that exceeds ‘4 of the observations and less than the remaining % 
observations. The second quartile is the median and the third quartile, known as upper 
quartile, is the value in the order statistic that exceeds %4 of the observations and is 
less than remaining % observations. 

In case of ungrouped data, the quartiles are calculated by splitting the order 
statistic at the median and calculating the median of the two halves. If n is odd, the 
median can be included in both halves. 

Example 3.16: Find Quartiles for ungrouped data of the example 3.13 
Solution: We know that median of data is the mid value of the order statistic. For 
finding quartiles, we split the order statistic at the median and calculate the median of 
two halves. Since n is odd, we can include the median in both halves. 
The orders statistic is 
84.60, 88.03, 94.50, 94.90, 95.05 
Q, = Median = ¥,,,, 
(3) 
= Yj), the third observation 
= 94.50 


Q i= Median of the first three value = ‘( 2) 
i‘ 
= Y(2), the second observation 
= 88.03 


Q;3= Median of the last three values = Y 


a 
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= Y 4), the fourth observation 


= 94.90 
For the grouped data (in ascending order) the quartiles are calculated as: 
ofr ee... (3.24) 
f\4 
h{2n 
= 1+—|—-c 3.25 
Q: j 4 (3.25) 
h{3n 
Bone goats reer 2 
Qs r ( 7 : (3.26) 
Where lis the lower class boundry of the class containing the Q1, Q2 or Q3. 


his the width of the class containing the Q1, Q2 or Q3. 
f isthe frequency of the class containing Q1, Q2 or Q3. 
c is the cumulative frequency of the class immediately preceeding to the 


class containing Q,.Q, or Q,, Ee or a are used to locate Q;, Q> or Q; 


group. 
Example 3.17: Find quartiles for data of the example 3.2. 
Solution: 


85.5 — 90.5 
90.5 — 95.5 
95.5 — 100.5 


100.5 — 105.5 
105.5 — 110.5 
110.5 — 115.5 





To locate the class containing Q,, 
ri observation = > th observation 


= 7.5 thobservation _ 
7.5 th observation falls in the group 90.5 — 95.5. _ 
So, Qi: group = 90.5 — 95.5 


WAU 


Qi=/1+ “een 


= 90.5 + >(1.5-6) 


= 92.3750 cm 
for Qo, 


th observation 





2 th observation = 20) 


= 15 th observation falls in the group 95.5 — 100.5 
So, Q=I+ 725-4) 
Fu 


= 95.5 Pet (15 — 10) 
10 


= 98 cm 
for Qs, 





* th observation = pxe0 th observation 


= 22.5 th observation 
So, Qs: group = 100.5 — 105.5 


h( 3n 
Q; “17 \ 


= 100.5 + = (22.5 — 20) 


= 100.5 + 2.0833 
= 102.5833 cm 
3.6.2 Deciles 


Deciles are the values in the order statistic that divide the data into ten equal 
parts. These are denoted by D;, D2, D3,....,Do. D; is the value of order statistic that 
exceeds 1/10 of the observations and less than the remaining 9/10. The fifth decile is 
the median and Do, the ninth decile is the value in the order statistic that exceeds 9/10 


of the observations and is less than 1/10 remaining observations. 
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Deciles for Ungrouped Data 
To calculate deciles for the ungrouped data the following procedure may be 
followed. 
i) Order the observations. 
mn 


ii) For the mth decile, determine the product oe I aa is not an integer, 


round it up and find the corresponding ordered value but if aa an integer, 


say k, then calculate the mean of kth and (k+1) th ordered observations. 
Example 3.18: The data of the example 2.1 is used to explain the procedure and D7 
and D3; have been calculated. 
i) The ordered observations are 
87, 87, 88, 89, 89, 90, 91, 91, 92, 95, 96, 96, 97, 98, 98, 98, 99,99, 100, 100, 
101, 101, 102, 103, 105, 105, 106, 107, 107, 112. 
Here n = 30 
ii) To calculate D7 the (7) (30) / 10 = 21, so, we calculate the mean of 21“ and 
22" observations i.e., D7 = (101 + 101)/2 = 101. 
To calculate Ds, the (3) (30) / 10 = 9, so, we calculate the mean of 9" and 10" 
observation i.e., D3 = (92 + 95)/2 = 93.5. 


Deciles for grouped data 
The mth decile for grouped data (in ascending order) is 
D,. shut the « (3.27) 
f\ 10 


Like the median, ah is used to locate the mth decile group. 


is the lower class boundary of the class containing mth decile. 

is the width of the class containing D,. 

is the frequency of the class containing D,,. 

is the total number of frequencies. 

is the cumulative frequency of the class immediately preceding to the class 


S ssa. 


containing Dp. — 
Example 3.19: Data of the example 3.17 is used to explain the procedure for grouped 
data. D,; and D7 are calculated 
Calculation for D; 





Lei observation = XP observation 
10 10 


= 3 observation 


So, D, group = 85.5 — 90.5 
Dy=1+ we Ltd -—Cc 
f\ 10 
5 
= 85.5+— (3-0) 
6 
= 88.00 cm 
Calculation for D7 
=" th observation = mai" observation 





= 100.5 + (21 — 20) 


= 101.3333 cm 
3.6.3 Percentiles 
These are the measures of relative standing of an observation within a data. 
The pth percentile is the value Yip) in the order statistic such that p percent of the 
values are less than the value Yip) and (100-p) percent of the values are greater than 
Yip). The 5" percentile is denoted by Ps, the 10" by Pio and 95™ by Pos. 
Percentiles for the ungrouped data 


To calculate percentiles for the ungrouped data, the following procedure is 
adopted: 


i) Order the observations. 


The procedure is explained on the data of the example 2.1 and Po and Ps have 
been calculated. 


ii) For the mth percentile, determine the product mip = is not an integer, 


round it up and find the corresponding ordered value and if — is an 


integer, say k, then calculate the mean of the Kth and (k+1) th ordered 
observations. 
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The ordered observations of the example 2.1 are: 

87, 87, 88, 89, 89, 90, 91, 91, 92, 95, 96, 96, 97, 98, 98, 98, 99,99, 100, 100, 101, 

101, 102, 103, 105, 105, 106, 107, 107, 112. 

To calculate Pio, the (10) (30) / 100 = 3, so, we calculate the mean of 3" and 
4" observations i.e., Pio = (88 + 89)/2 = 88.5. 

To calculate Pos, the (95) (30) / 100 = 28.5, so, 29" observation is our 95" 
percentile i.e., Pos = 107. 
Percentiles for the grouped data 

The my, percentile for grouped data (given in ascending order) is 


Pants OTe) ME 5 


Like the median, 5 is used to locate the mth percentile group. 


is the lower class boundary of the class containing the mth percentile. 
is the width of the class containing Pin. 
is the frequency of the class containing Pm. 
is the total number of frequencies. 
is the the cumulative frequency of the class immediately preceding to _ the 
class containing Py. 
The 50" percentile is the median by definition as half of the values in the data 
are smaller than the median and half are larger than the median. 
The 25" and 75" percentiles are the lower and upper quartiles respectively. 
The quartiles, deciles and percentiles are also called quantiles or fractiles. 
Example 3.20: Find Pio, P2s, Pso and P9s of grouped data for the example 3.17. 
| 10x30 
00 


oS Sees 


th observation 





Solution: 10n th observation = 
100 


= 3 observation 
So, Pio group= 85.5 — 90.5 


Piop=1+ » ete 
f\ 100 


= §5;5 +2 (3 -0) 


= 85.5 + 2.5 = 88.00 cm 





f \ 100 
=1+ a, c 
f\4 
= 0; 
Similarly, Pso= Q2 = Median 
and P75 = Qs, already calculated under the example 3.17 
a th observation = IPD th observation 
100 100 


= 28.5 th observation 
So, Pos group= 105.5 — 110.5 


bie” cl 
f\ 100 


= 105.5 - (28.5 — 26) 


= 105.5 + 4.1667 
= 109.6667 cm 
The percentiles and quartiles may be read directly from the graphs of the 
cumulative frequency function as in chapter 2 where, Q; is indicated. The Q3 may be 
read corresponding to a relative cumulative frequency of 0.75. 


3.7 Mode 


Mode is defined-as the most frequent value in a data set. In case of ungrouped 
data, the mode can be found by inspection of the order statistic. For example. five 
plants having heights in cms. 87, 82, 87, 90, 89. The order statistic for this data would 
be 82, 87, 87, 89, 90. 

87 is the value that comes twice while others are only once. So, by definition 
87 is the mode of this data. If data has only one mode, then it is called unimodal. The 
data may have more than one mode. It may be bimodal (having two modes) or 
multimodal (having more than two modes). The data is said to have no mode, if every 
value of the data equal number of times. 


The mode for the grouped data (given in ascending order) is calculated by 


Chapter 3 Measurement of Location 


Rte es uk Se ae (3.29) 
Cn ta) So shed 


I is the lower class boundary of the modal class. 

Jm is the frequency of the modal class. 

fi is the frequency associated with the class preceding the modal class. 

fo isthe frequency associated with the class following the modal class. 

h_ is the width of the model class. : 

model class is the class in which maximum frequency lies. 

Example 3.21: Find mode for the data of the example 3.17. 
Solution: The maximum frequency is 10 for the class 95.5 — 100.5, so, it is a model 
class. - 


Mode-(k) a4 ee h 
7. sl) Ps ol msi 
Here 12955; h= 5) 7.= 10, 7, = 4. f =6 
Nii tds 
(10—4)+(10—6) 
= 95.5 +3.0 
= 98.5 cm 


Example 3.22: The table shows the distribution of the maximum loads is shot tons 
supported by certain cables produced we a company. 


9.3 —9.7 
9.8 — 10.2 
10.3 — 10.7 


10.8 — 11.2 
10.3 — 11.7 
11.8 — 12.2 
12.3 = 12.7 
12.8 — 13.2 





Determine its mode. 





Solution: 





93-97 9.25 -9.75 
9.8—10.2 9.75 — 10.25 
10.3 — 10.7 10.25 — 10.75 
10.8 —11.2 10.75 — 11.25 
To 187 11.25 — 11.75 
11.8—12.2 11.75 — 12.25 
12.3 = 12.7 12.25 ~ 12.75 
12.8 - 13.2 | 12.75 — 13.25 

a th 
5 = Fi +f, =F) 
17-12 
=10.75 + aaa 
= 11.06 tons 


Example 3.23: Find mode for the data of the example 2.1. 


87 88 89 90 91 92 95 96 97 98 99 100 101 102 103 105 106 107 112 
f 


emcees ln2int of pines fae, ZB, ehacdate? 0b 6 Qimsal 


Mode = The value which occurs most frequently in the data. 


..Mode = 98cm. 
3.7.1 Properties of mode 
i) If a constant a is added to each of the n observations Y,, Y,,. Y, having mode 


m, then the mode of a+Y;, a+Y>, ....,a+Y, would be a+m. 


ii) If a is multiplied with each of the n observations Y,, Y,, .....¥,, having mode m 
then the mode of aY,, aY,, ...., aY,, would be am. 


3.7.2 Merits of mode 
i) It is very quick to find. 


ii) It is not affected by extreme values. 
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3.7.3, Demerits of mode 


i) It is not rigidly defined. 

ii) It is not capable of further mathematical development easily. 

iii) It uses only a few members of the population, so can be misleading in small 
data sets. 

iv) It is an unstable measure like median. 

v) There may be more than one values of the mode in a data set. 

vi) It may not exist in many cases. 


3.7.4 Empirical Relationship Between Mean, Median And Mode 


The empirical relationship depends upon the shape of the distribution of the 
data. The distribution of a data set is called symmetrical if the frequency curve for 
the data is such that the left of curve to its mean is the mirror image of the portion to 
the right of mean. Otherwise, the distribution is called skewed. The skew may be to 
the right or to the left depending upon the shape of curve. The empirical relationship 
is described as follows: : 


a) In a single peaked symmetrical distributions mean, median and mode are 
equal i.e., 
Mean = Median = Mode (3.30) 


It is indicated in figure 3.1 


Mean = Median = Mode 
Figure 3.1: single peaked symmetrical distribution. 


b) For moderately positively skewed distributions, the following empirical 


relation holds. - 
Mean > Median > Mode (3.3k) 


Mode . mean 
median . 


Figure 3.2: Moderately Positively skewed distribution 


c) For moderately negatively skewed distributions, the following empirical 
relation holds. 
Mean < Median < Mode (3.32) 
It is indicated in figure 3.3. 


mean mode 
median 
Figure 3.3: Moderately negatively skewed distribution 


d) For moderately skewed distributions median divides the distance between 
mean and mode in the ratio 1:2 i-e., 

Mean -Median _ 1 

Median-Mode 2 


or Mode = 3 Median — 2 Mean 


(3.33) 


Example 3.24: If mode = 15 and Median = 12, find mean. 
Solution: If Mode = 15, Median = 12, Mean =? 
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We know that ; 
Mode = 3 Median — 2 Mean 


i 3 Median — Mode 
2 
4, 12) 15... 36-15 
2 p 


Example 3.25: Mean and Median of a frequency distribution are 45 and 30 
respectively. Find mode of the distribution. 


Solution: Mean = 45, Median = 30, Mode = ? 
We know that 


Mode =3 Median —2 Mean 
= 3(30) — 2(45) = 90 —90 =0 


Mean 


=10.5 


3.8 Selecting a Suitable Measure of Central Tendency 


To select an appropriate measure for a situation, certain factors are taken into 
account. This includes the type of variable, the purpose of the statistic for which it 


would be used and the type of distribution. 


For the quantitative variables, arithmetic mean is usually appropriate. For the 
categorical variables, the median and mode are appropriate depending upon the type 
of categories. For example, if we consider the eye colour, then mode is appropriate 
but if the categories are income groups, then the appropriate measure is median. 

The type of the distribution is an important aspect to evaluate which statistic is 
appropriate. If the distribution is symmetrical, all the measures i.e., mean, median and 
mode being equal are equally good. In the skewed distributions median is preferred as 
it is not affected by the extreme observations. Medians are also preferred to the means 
when the sample constitutes only small part of the population. Geometric and 


harmonic means are useful for averaging rates and ratios. 


ie 


rcise 3 





; 
I 


3.1 i) 


ii) 


3.2 
3.3 


3.4 
3.5 
3.6 


- 
— 


it 


3.7 


3.8 


3.9 -1) 


ii) 


Define arithmetic mean, geometric mean and harmonic mean. _ Explain the 
situations when each of them is used perfectly? 


The relation between arithmetic mean (A.M), geometr.c mean (G.My and 
harmonic mean (H.M) is 
A.M > G.M. > H.M. 
Under what situation these are equal. 
Find the geometric mean of 50, 67, 39, 40, 36, 60, 54, 43. 


A man traveling 100 kilometers has 5 stages, at equal intervals. The speed o. 
the man im the various stages was observed to be 10, 16, 2C, 14, 15 kilometers 
per hour. . 


Find the average speed at which the ‘man travels. 

Calculate mean, median and mode for table 2.5 

Calculate mean, median and mode for the grouped data of| table 2.7 
Calculate the following for the data of exercise 2.8 (b) 


Calculate an estimate for the mean leaf length. 


Construct a cumulative frequency table and use it to estimate the sample 


‘What do you understand by weighted mean? In what. circumstances is it 


preferred to ordinary mean and why? 


Define the mode of a frequency distribution. How does it compare with other 
types of averages? 


Write down the empirical relation between mean, mediar‘ and mode for 
unimodal distribution of moderate asymmetry. Illustrate graphically the 


relative positions of the mean, median and mode for frequency curves which 
are skewed to the right and to the left. 


For-a certain frequency distribution, Wi \ the mean and median 45 and 36 


respectively, find the mode approxir-ately using the empirical relation 
between the three. ; 
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3.10 Bilal gets a rise of 10% in salary at the end of his first year of service and 
further rise of 20% and 25% at the end of the second and third year 
respectively. The rise in each case being calculated on his salary at the 
beginning of the year. To what annual percentage increase in this Pee sa: 


3.11 Find the Mean for the following distribution. 


(Gmen] 00 | 0-0 | a0 [on] ons [sn [a 
eae 






3.12 ‘The frequency distribution given below has been derived from the use of 
working origin. If D = X — 18, find arithmetic mean and Geometric mean. 





3.13 The reciprocals of 11 values of x are given below: 
0.0500, 0.0454, 0.0400, ,0.0333, 0. 0285, 0.0232, 





~ AH na AeA 


3.15 Three cities A, B, “Care equidistant from each other. Fatima travels from A to 
B at the speed of 30 miles per hour by car. From B to C at speed of 50 miles ' 
per hour. Determine her average speed for the entire trip. 

3.16 Harmonic Mean and Geométric Mean of two numbers are 3.2 and4 
respectively. Find their Arithmetic Mean and both the numbers as well. 

3.17 The Arithmetic Mean and Geometric Mean of three numbers are 34 and 18 
respectively. Find ail the three numbers, when the Geometric Mean of the 
first two numbers is 9. . 

3.18 Find out. 

i) The average rate of motion in the case of a person who rides the first 1‘ ¢ at 


the rate of 10 miles per hour the next mile at the rate of 8 miles per hourar 
the third at the rate of 6 miles per hour. 











i 


ii) Increase in population which in the first decade has increased 20% in the next 


25% and in the third 4%. 
Maximums No. of cables 
Loads 


3.19 The given table shows the . 
oad | 













distribution of the maximum 







loads in short tons supported 


by certain cables produced by 









RS A a 


0.7312 — 0.7313 10 
2% 
=| 
pona-ome | 8 | 


0.7324 — 0.7325 2 


a company. Determine Mean, 


Median and Mode. 







3.20 Compute Mean, Median, 







Mode, 6" Decile, and 74" 


percentile for the data given in 











the table: 





3.21 Find the value Q3, Ds, Ps and mode for the following data: 


ap or | 















3.22 


3.23 


3.24 
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If for any frequency distribution the Mean is 45 and the Median is 30. Find 


Mode approximately, using formula connecting the three. 


A bus traveling 200 miles has ten stages at equal intervals. The speed of the 
bus in the various stages was observed to be 10, 15, 20, 75, 20, 30, 40, 50, 30, 
40 miles per hour. Find the average speed at which the bus has traveled. 


The following data has been obtained from a frequency distribution of a 
x —136.5 
6dsail 


continuous variable x after making the substitution u = 





Find Harmonic Mean. 


3.25 Salman obtained the following marks in a certain examination. Find the 


weighted mean if weights 4, 3, 3, 2 and 2 respectively are allotted to the 


subjects. 





tes [Raat Wee 
ioeknond asics 
75 
75 











Ca 
oar | 







3.27 For a certain distribution, if £ (x — 15) = 5, 2 (x — 18) = 0, X@ -— 211) = -21 
What is the value of A.M and why? 


3.28 Arithmetic Mean of 15 values is 20 and by adding 3 more values, the mean 


‘remains 20. Find the new three values if ratio is a:b:c ::3:2:1. 
3.29 State the following as true or false. 
i) Ina symmetrical distribution, mean, median and mode are equal. 


ii) The algebraic sum of the deviations for a set of observations from their mean 


is always zero. 
iii) Median is affected by extreme hisecvatiohis 
iv) Mode is not affected by extreme observations. 
v) Frequency polygon is an increasing function. 
vi) For highly erie distributions, median is preferred over mean. 


vii) Mean of a data set remains unchanged if a constant is added in Saich 


observation. 
3.30 What — of averages would you prefer to average the following: 
i) Marks obtained in an examination. 
ii) Growth rate of population of different cities. 
iii) Height of students. 
iv) Size of agricultural holdings. 
v) Increase in salaries. 


vi) IQ level of students in a class. 
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331 Fillgathepignts~ 
i)  Anaverage obtained to represent a data is called 
ii) A good average should not be effected by valued 


iii) | Sum of deviations from mean is always 


iv) Median is a value that divides an ordered data into | parts: 
v) For estimating an average rate of a of ces 2 is a 
better average. 


vi) | The mean and median of two values is always 
vii) In qualitative data, the most suitable average is 
viii) , -A distribution having two’ modes i is — distribution. 


ix) In symmetrical distribution, the three averages mean, median and. 


mode are 





x) If extreme large 0 or small values are changed, values of are 


not effected. 


3.32 Write T for true and F for false against each statement. 
i) The sum of deviations of the values from mean is minimum. 
ii) Geometric mean is possible ey for poeative values. 
iii) ‘The median divides the data into two halves. 
iv) The third quartile is the median. 
vy) — Adistribution having only one mode is called uni-modal distribution. 
vi) Arithmetic mean depends on all the values of the data. 
vii) Mean, Median and Mode in a symmetrical distribution are not eual: 
vii) Harmonic Mean can be calculated if any value is zero. | 
ix) Median is.not affected by extreme values. ° 


x) Geometric Mean cannot be calculated if any value is negative. 


as — 


Measures of Dispersion, . 


5 Pelt. 
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4.1 Introduction 


The measure of central tendency does not tell us any thing about the spread of 
the values in a set, because any two sets with vast difference in magnitude of their 
variability may have the same central tendency. Look at the following two data sets: 

Data set a: 8, 7,5, 8,6 Dataset b: 1,4, 7,10, 12 

These two data sets have same mean 6.8 but differ in their variations from the 
central value. There is more variation in the date set b as compared to the data set a. This 
illustrates the fact that measure of central tendency is not sufficient. 

To give a sensible description of data, a numerical quantity called measure of 
dispersion or variability that describes the spread of the values in a set of data is 
required. 

Two types of measures of dispersion or variability are defined: 

i) Absolute measures ii) Relative measures 

The absolute measures are defined in such a way that they have units (meters, 
grams etc) same as those of the original measurements, whereas the relative measures 
have no units as these are ratios. 

The most common measures of absolute variability are: 

a) Range b) Quartile Deviation 

c) Mean Deviation d) Variance 

e) Standard Deviation 

These are also called measures of dispersion or measures of spread. 

The relative measures are discussed in article 4.2 


4.1.1 Range 


The range of n values Y, Y2 .... Y, is define? as the difference between the 
largest and smallest observation. If Y,,, is smallest in magnitude and Y,,, is largest in 
magnitude then range denoted by R, is defined by; 

R = Yn) — Yo (4.1) 
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This is very simple measure of variability and only takes into account two 
most extreme observations. 
Example 4.1: Calculate range for the following observations (in cms): 
84.2, 87.5, 80.7, 92.4, 91.9, 86.5, 85.4 
Solution: Here Yq)= 80.7 and Y,,)=92.4 
so’ R=92.4-—80.7 
=11.7cm 
Range for Grouped Data: In case of grouped data, it may be calculated by the 
following formula: 
R = mid value of the highest class — mid value of the lowest class 
Example 4.2: The following frequency Distribution gives the weights of 90 cotton 
bales. 
5-8 
Frequencies | 1 [7 | 7 | @ fm. [oi6 


Find its range. 






Solution: 













1 69.5 — 74.5 The mid value of first group is 
74.5 — 79.5 SBR srds Sie 72 and the mid value of last 
79.5 — 84.5 2 
84.5 — 89.5 group is niche =97,s0 
89.5 — 94.5 % 
94.5 99.5 Yo) = 97.0 = Yq) = 72.0 


| R=Yq@-— Yq) = 97.0—72.0=25 


Merits of range: 


i) It is easy to calculate. 
ii) It is a useful measure in small samples. 


Demerits of range: 


i) It is not based on all the observations. 
ii) It depends only upon the extreme observations. 


4.1.2 Quartile Deviation 


This measure is based on quartiles Q and Q; and is denoted by Q.D. It is 
calculated as 


QD= 1-8 me (42) 


It is also known as semi inter quartile range. 


This measure cannot be negative because the upper quartile must be atleast as large as 
the lower quartile. A small value of quartile deviation indicates a small amount of 
variability whereas larger values indicate more, variability in the data set. It measures 
half of the difference between the upper and lower quartiles. 


Example 4.3: Calculate quartile deviation for the data of the example 3.16. 
Solution: Q, = 88.03 cms, Q, = 94.90 cms 
94.90 - 88.03 
2 
= 3.435 cm 


The formula of quartile deviation for grouped data is the same as for 
ungrouped data. i.e., 


Therefore, Q.D= 


ithe Q; -Q 
Pat hn wath 


Example 4.4: Calculate quartile deviation for the data of the example 3.17 
Solution: Q, = 92.3750, Q, = 102.5833 
102.5833—92.3750 


Q.D a os eee = 5.104 cm 


Example 4.5; For the data given below: 
1030, 1590, 1070, 1670, 1110, 1710, 1190, 1720, 1230, 1740, 1310, 1745, 
1332, 1775, 1870, 1350, 1430, 1870, 1950 and 1460, 

calculate Quartile Deviation and co-efficient of Quartile Deviation. 

Solution: Arraying the data 
1030, 1070, 1110, 1190, 1230, 1310, 1332, 1350, 1430, 1460, 1590, 1670, 
1710, 1720, 1740, 1745, 1775, 1870, 1870, 1950. 
Here, n = 20 
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| Q, =Valueof{ = }oitem 
= Jovem 


= Value of (5.25) th item 











= Value of ( wid 


“. Q, =5th value + 0.25 (6th value —5th value) 
=1230+0.25 (1310-1230) 
=1250 


Q, =Value of {* r ho item 


= Value of | a Joitem 


= Value of 15.75 th item 





“. Q, = 15th value +0.75 (16th value — 15th value) 
= 1740+0.75 (1745-1740) 


= 1743.75 
Q, i) 
“.Q.D = ——— 
Q y 2 
Ss EOE = 246,88 
- 1743.75-—1250 
Co - efficient of Q.D = Q;=Q, Ss 0.16 


Q,+Q,  1743.75+1250 


Example 4.6: For the following frequency distribution, find quartile deviation and 
co-efficient of quartile deviation. 


pee [feist ys 






ene 
Cumulative frequency 
(C.F.) 
3 


3+8=11 
li+14=25 
25,+ 7.= 32 
32+4= 36 


Solution: 















Onan: vat 


f\ 4 


n = 36, “=> = 9th observation so, tL = 20, k=10, f =8, c=3 
10 36 
”. Q, =20+— (— -3)=2715, 
Q, a4 = 





gmagpetdae 
pled 


h( 3n 
aia) 


“Q,= 40+ a3 725) =42.8 


= 27th observation so,/ = 40, h=10, f =7, c=25 


Q;-2, 
. QD=——" 
2 
C 28-775 «765 


Q3- Qy _ 428-27.5 
Q,+Q, 42.8427.5 


Co - efficient of Q.D.= —-——— =0.22 


Example 4.7: Find the Semi — Interquartile range and co-efficient of Quartile 
deviation for the data given below about the ages in a locality. 













Ages [| | [| | w@ | | m0 
Frequency | 3 [ol | 2] 58 [v0 | 31 [2 
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Solution: 





t= = 5477 = te = 136.7 = 137th observation so, 


| =35, h=10, f=132, c=64 


Q, =35+ an Cae 40.51 


_ 


132 
Tier y ae malas 

f\ 4 

3n 3x547 





n = 547, res r = 410.1 = 410th observation so, 
bo = 55,.h=10,-f =e, eee 
oO, = 2+ Se: 3x27 i — 354 |=59.02 

140 4 


ee _ 59.02-40.51 _ 9 955 


2 
Q;-O, “0, is 59.02—40.51 _ 18.87 _9 19 
OG, 59.02+40.51 99.17 


Semi — Inter Quartile Range = 
Co - efficient of Q.D.= ——_ 


Merits of quartile deviation 
i) It is easy to calculate. 
il) It is not affected by extreme observations. 


92 
Demerits of quartile deviation 


i) It is not based on all the observations. 
ii) Q.D. will be the same value for all the distributions having the same quartiles. 


4.1.3 Mean deviation 

_~ It is defined as the mean of the absolute deviations of observations from mean, 
‘median or mode. By absolute deviations we mean that we consider all the deviations 
as positive. It is denoted by M.D. and is calculated as 


=|Y—M| (Mis mean or median or mode) 


M.D= (4.3) 


n 
Example 4.8: For ungrouped data of the example 3.13, find mean deviation 
Solution: Y; : 88.03, 94.50, 94.90, 95.05,84.60 


LY = 88.03 + 94.50 + .... + 84.60 = 457.08 





yy ah 
n 
_ 457.08 =91.416 
5. 
zy, -¥| 
Mean Deviation (M. D) = —-———— 
n 


Where ly, -¥| are ; 
188.03 - 91.416] = 3.386 
194.50-91.416| =3.084 
194.90-91.416] =3.484 
195.05 -91.416| = 3-034 
\84.60-91.416| =6.816 

xly, -¥| = 20:404 

20.404 





M.D. = = 4.0808cm 
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Dealing with the grouped data, mean deviation is calculated by multiplying 
the absolute deviations from mean with the corresponding frequencies and then 
taking the mean i.e., 
=f, |¥,-Y| 
2 flees 


M.D. = 


Example 4.9: For the data of the example 3.2, find mean deviation for grouped data. 


Solution: 





y =2L%_~97,8333 om 
rf, 


per ea a 


86-90. 
91-95 
96 — 100 
101 — 105 
106 — 110 







Sf AXi- 
58.9998 
19.3332 
1.6670 
31.0002 
30.5001 

























111-115 1 15.1667 
.. Tt fe 1 80 156.6670 
We Ef |% -¥| 
Mean deviation (M.D) =———-——_—_ 
rf 
_ 156.6670 
30 
= 5.2222cm 


Mean deviation Som apiiine is defined in terms of absolute deviates from 
scams as: 


E|¥, —Median 


M.D. = (4.4) 


n Sue 
The mean deviation from median for the. data of the example 3.13 is 
calculated as: 








188.03 —94.50| = 
194.50-94.50| =0.0 
194.90 -94.50| =0.40 
195.05 —94.50| =9.55 
|34.60-94.50| = 9-90 


Y|Y;—median| = 17.32 








= 3.464 
The mean deviation from median for grouped data is: 
xf, |¥, — Median | 
rf 


The calculations for mean deviation from median taking median equals to 98 
for the example 4.9 are: 


M.D = 





Properties of mean deviation 
i) M .D from median is less than any other value ben 
=|¥, — median | 
n 


ii) It is always greater than or equal to zero ice., 
M.D =0 


is least 
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ili) For symmetrical distributions, the following relation holds 
M.D = = oO (4.5) 


Where ois the standard deviation. 
Merits of mean deviation 
i) It is easy to calculate. 
iii) It is based on all the observations. 


Demerits of mean deviation 

i) It is affected by the extreme values. 

ii) It is not readily capable of mathematical development. 

ili) It does not take into account the negative signs of the deviations from some 


average. 


Example 4.10 Find mean deviation from median for the following frequency distrib ution. 


Ages (Years) (pod0-15: 





Also calculate the co-efficient of mean deviation. 


Solution: 





= 16.25 


“3 5(37.5 —3 
4 ayepe sie Sa) 0) 


Bf-¥|_ 293.75_, 4, 
A ath 


uf 


Mean Deviation from median = 


a 


Mean Deviation from median 
Median 





Co-efficient of mean deviation = 


ee 0.24 
16.25 
4.1.4 The Variance 
Variance of the observations is defined as mean of squares of deviations of all 
the observations from their mean.. When it is calculated from the population the 
variance is called population variance and is denoted by co? and when it is calculated 


from the sample, based on n values Y, Y;. ..... Y, is called sample variance. The 


282 Orin)? 


Population variance o” is defined as = 


The sample variance re for un-grouped data is defined as: 








> ew 
572 20 (4.6) 
n 
Short formula for variance is given by 
Ly? ( Zy) 
s? | (4.7) 
n n 
For a frequency distribution, the sample variance S’ is defined as: 
IO DYE. 
St — ' ! - 
reek - (4.8) 
Short formula for grouped data is given by 
ea a 
Uf of 
. . % 2 
If A is an arbitrary value such that D = y — A, S for ungrouped data is 
given by 
jee { SDY 
S°= = __ (4.10) 
n n 


for grouped data 
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22 2 lee (4.11) 
xf xf 


When data are grouped into a frequency distribution with equal class intervals 














of sie ham ga ties 
| Z fu? ( Bfu Y 
al 7 -| HY » 442) 


Merits and Demerits of Variance. 
The variance is based on all the observations of a series. It is easy to calculate 


and simple to understand. It is affected by extreme values. 


4.1.5 Standard Deviation 

The standard deviation is defined as the positive square root of the mean of 
the squares of the deviations of values from their mean. In other words, standard 
deviation is a positive square root of variance. It is denoted by S and is given by — 


For ungrouped data 


y)2 
5 =, ,|20-¥Y) (4.13) 
n , . 


In short cut method 
2 2 
S= | 2 e et (4.14) 
n n 
For frequency distribution 


_ [HY 4.15 
Ss x . (4.15) 


In short cut method 


papy? (SafYiy : 4.16 
apumeT = 











98 


If D = Y—A, the deviations of Y from any arbitrary value A then standard deviation is 














bilhioae of oe (4.17) 
n n 
For frequency distribution, the formula becomes 
ON EN Me (4.18) 
of of , 


For coding variable u= vos , the formula becomes 








san |2t# ee ) ae (4.19) 


aE ANE? 
Example 4.11: Calculate variance and standard deviation for the data: 3.042.131,0- 


Solution: 


7a =——_=4 
n 6 


Variance = S” = SOND F467 
n 


v2 
Standard deviation = S$ =; 
n 


=4/4.67 
= 2.16 





Example 4.12: Calculate variance and standard deviation from the following 


frequency distribution. 


Wages | 30-35 | 35-40 | ia | 45-50 | | 50-55. | | 55-60 | 
agate | 8 2 I a 
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Solution: 





Example 4.13: Calculate variance and standard deviation by using any provisional 
mean from the data: 3,5,7,13,15,17,23,27. 
Solution: 





Variance in rales 
n 





n 
: (30 
8 8 
= 65.5-1.562 
= 63.938 
Sa 
D D 
Standard Deviation (S) = re 
= 463.938 =7.99 
PROPERTIES OF THE VARIANCE AND STANDARD DEVIATION | 
1. The variance and standard deviation of a constant is zero. If a is a constant, 
then 
var (a) = 
S.D. (a) = ee 
2. The variance and standard deviation are independent of origin. 


var (y + a) = var (y) 


4.2 


ha 





var (y — a) = var (y) 
and = S.D. (y+ a) =S.D. (y) 
S.D. (y — a) =S.D. (y) 


When all the values are multiplied with a constant, the variance of the values 
is multiplied by square of the constant and their standard deviation is 
multiplied by the constant i.e.. 
vat (ay)=a’ var (y) 
var (y/a) = (1/a’) var (y) 
and =S.D (ay) = |a| S.D () 
S.D (v/a) = |I/a| 8.D.(y). 
The variance/standard deviation of the sum or difference of two independent 
variables is the sum of their respective variances/standard deviation for 
independent variables x and y. 
var (x + y) = var (x) + var (y) 
var (x — y) = var (x) + var (y) 
and =S.D. (x + y) = S.D. (x) + S.D. (y) 
S.D. (x — y) = S.D. (x) +.S.D. (y) 


If sets of data consisting n,, n,, ...., 1, values having corresponding means y,, 
Yo, +++) ), and variances S?, S},....,S;, the variance of combined set of 
data is given by 
2 —e et ; 
_ dn, [si+(9,-7.¥ ] (4.20) 
: Xn, 


wae ee nye Th, Yo 1h... HS, 
si n, +n, + .... +n, 


Co-efficient Of Variation and Other Relative 
Measures | 


The most important of all the relative measures of dispersion is co-efficient of 


variation. Co-efficien of variation is a relative measure of dispersion and independent 
of units of measurement and expressed in percent ge. It is used to compare the 
variability of different sets of data. The group which has lower value of 


y. stands for combined mean notation 
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co-efficient of variation is comparatively more consistent. The co-efficient of 
variation is defined as: 


Co - efficient of variation =C.V. ='© 2100 (for sample) . 
y 


c.v=2x100 (for population) 
MU 


As it is a ratio of the two quantities with the same units, so is a dimensionless 
quantity i.e., for the same data whether it is in millimeters, centimeters or meters, etc. The 
co-efficient of variation remains the same and has no unit. 


As the co-efficient of variation expresses variability relative to the mean, it is 
called a measure of relative variability or relative dispersion. 


The co-efficient of variability for the example 4.9 is given by 


CN. = ey 100 
a 
S = 6.8837 
y = 97.8833 
50. CVS see ae tana 
97.8833 


Large value of C.V indicates that the observations have much spread relative - 
to the size of the mean and vice versa. 


This measure can be used to compare the variability of two or more 
populations. It will take the same value for two or more populations if in each 
population, the standard deviation is directly proportional to the mean. In such 
situation, we say that two or more populations are consistent. For example, to 
compare the consistency of two methods, each method was tried on 16 soil samples 
and the corresponding results obtained are: 


Method I Method II 


y 15.0 10.5 
S ee | 


CV(%) 9:3 9:5 


es 


The CVs being almost equal indicate that both the methods are equally reliable. 
We actually do not compare the standard deviations, since the means will apparently be 
widely different. 





Some other relative measures of dispersion are: 





: Yin —Yu 
a) Co-efficient of Range = (4.21) 
(n) a Yi) 
b) Co-efficient of Q.D. = Q3 72 . (4.22) 
O,+9, 
: : , M.D 
c) Mean co-efficient of dispersion = —— (4.23) 
y 
’ . : : M.D 
d) Median co-efficient of dispersion = : (4.24) 
. Median 
e) Co-efficient of standard deviation = Bes (4.25) 
y 


Example 4.14: Calculate co-efficient of variation and co-efficient of standard 
deviation from the following frequency distribution. 


wit ai Ne ead er ae PE oa 


Solution: 
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Standard deviation _ 1.307 
Mean 1.2 


Co-efficient of variation = tena Malia — 
y 


Co-efficient of S.D = = 1.089 


x 100 = 108.92% 





Example 4.15: Given the following results, find the combined co-efficient of 
variation. 


n, = 100 1=2.4 y, = 12.5 
n, = 120 8, =4.2 y, =15.8 
n, = 150 S, = 3.7 y, =10.5 
Solution: Combined mean is given by 
— _ my,t+n,y,+n,y;  100(12.5) +120015.8)+150(10.5) 
Norn, + 15 100+120+150 
= 12.76 


c 


Combined variance is given by 


Ret n+ Nn; 


_ 100 k 76+ (12.5-12.76)° |+120h7.64+(15.8 -12.76)° | +150]13.69+( 10.5 — 12.76 y| 
nm, +n5 +n 
6628.19 rae 
370 





Bee 17-9140. SoS 23255 


c 


ere = Bs x 100 
Y. 


_ 4.2325 
12.76 


4.3 Moments 


The measures of location alongwtih measures of variability are useful to 


describe a data set but fail to tell anything about the shape of the distribution. For this 





x 100 = 33.17% 


es 


purpose, we need to define certain other measures. Some important measures about 
the shape of the distribution depend upon what we call moments. These measures are 


discussed under skewness and kurtosis.» 


4.3.1 Moments about mean 
The moments about mean are the mean of deviations from the mean after 


raising them to integer powers. The rth population moment about the mean is denoted by 
u, is defined as: 


For ungrouped data 


A= eT He (4.26) 
Where, r= 1,2, .... 
and the corresponding sample moments about mean y , denoted by m, is given by 


L(y, ay y 
. ge re (4.27) 


So, the first moment m, is given by 


oe ae, 


n 


m 


m, 


m is always zero as the numerator > ( y,-y)=0 
= i=l 


by ( Vey y 
Second moment m, is given by: m, = Ps: 
This is the same as variance S? . 


¥(y,-¥)} 
m, = =-_____ 


Third moment m, is given by: ; 


n 
tr (9,-5) 


n 





and fourth moment m, is given by: m, 
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If the data are grouped then the rth sample moment about the mean y is 


vF(y-F y 
defined as: m, = ————__—_ 
n 
where n = ¥ f, (4.28) 
vFO.-y) a 
See le canes Ae 


n Xf 


n Se 


m, = 


LEOAN Bross | 
Ee See A 


m, = 


LOH) Soa 
sl BESS 


Mm, = 


Example 4.16: Calculate first four moments about the mean for the following set of 


marks obtained in the examination. 
45, 32, 37, 46, 39, 36, 41, 48 and 36. 


Solution: 








n 9 
m, = y=3) . Vin, 
n 9 
Se ae aa 
n 
m,= LOH) _ 186 _ ay 67 
n 
myx RI). 10106 3 1199.56 


Example 4.17: Find first four moments about the mean for the following data. 


eer «ane pes eee | 


Solution: 















eked | 1950 | 


eT > £50 
ee VIET) es 4 
aes ae 


bepee edie  Po8P 


ae 50 
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s ¥ f(y-¥) = —5250 = <405 


ee Sh 20 80 
aes Lf(y-y) _ 281050 apt 
ej 50 


4.3.2 Moment about an arbitrary value 


The th sample moment about any arbitrary origin a denoted by m’, is defined as: 


X(y,-4 y 2D; 


m! : 2 (4.29) 
where, D; = (y, — a) 
so 
Ps (y, —a ) 2 D; 
m’ 4 : E; a _ (4.30) 
of (y, —a y na D; 
m’ i=l : = an (4.31) 
> (y, eee y Be D; 
m’ — _# : = aoe (4.32) 
Zz, (y, —a y pa D; 
m’ a el — a (4.33) 
n n 


The moments about the mean are usually called central moments and the 
moments about any arbitrary origin a are called non-central moments or raw 


moments. 


The 7th sample moment for grouped data about any arbitrary origin a denoted 
by m’__ is defined as: 





Yhor-a  LAD/ 

















m' = = i (4.34) 
YAO.-@) Sf D, 

m, = Ta Ep > 7 (4.35) 
YAi-a YA? 

m, = es Fra “> F (4.36) 
YAO,-a DAD? | 

m, = aa “SF (4.37) 
LAG-o LAD" 

m, = Sey ¥ r (4.38) 

Ne eae: 5 ee OR (4.38.a) 

ii: teh, Gee ee (4.38.b) 

Mi, = We — Sem QM FaiccmRie--nscnrosnd (4.38.c) 


m, = m,— 4m, m! + 6mi(my —3(m')'... (4.38.0) 
Example 4.18: Find first four moments about the mean for ungrouped data of the 
example 2.1. 


Example 4.18: The moments can be calculated directly by using relation (4.27) or 
relation 4.29 by selecting an arbitrary origin a. We calculate moments about origin a 
taking a= 98 and then calculate moments about mean by using the relations 4.40 to 4.43. 
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36 
64 
0 
9 
1 
4 
4 
9 
4 
EO 

1 
0 
4 
16 
1 
9 
49 
25 
81 


z=Z2es 





m’, = 11/30 = — 0.3667 


m =) Di/n 

= 1309/30 = 43.6333 
m. = tn 

= —917/30= — 30.5667 
m = Y Diin 


= 124225/30 = 4140.8333 


The moments about mean are given by 

m, =m — m 
=0 

a= m ~ (m’ p 
= 43.6333—(—0.3667)° 
= 43.6333—0.1345 
= 43.4988 

m,= m. —3m' m’ + 2m’ y 
= — 30.5667 —3(43.6333)(—0.3667) + 2(—-0.3667)° 


= — 30.5667 +48.0010—0.0989 
= 17.3354 


m4 =m’, — 4m), m) + 6m), (mi)? — 3(m'\)* 
= 4140.8333 — 4(-30.5667)(—0.3667) 
+ 6(43.6333)(-0.3667)° —3(—0.3667)* 


= 4140.8333 — 44.8352 + 35.2039 — 0.0542 
= 4131.1478 


4.3.3 Moments for grouped data 


The moments for grouped data about an arbitrary origin with equal class 
interval h may be written as: 


(4.39) 





yi a 


where u, = 


The moments about an arbitrary origin and moments about mean have the 
following relationships. 
m, =m — m’ = 0 (4.40) 
m, =m —(m' } (4.41) 
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. - mi, — 3m, m+ 2 (mi) ee) 
= m'— 4m’, mi + 6m), (m)’ — 3(m, )" (4.43) _ 

or De about mean are calculated by using the above relations in which 
one needs to calculate moments about mean the arbitrary origin first or these can be 
directly calculated by using formula moment about mean. 
Example 4.19: Find first four moments about an arbitrary origin a = 98 for grouped 
data of the example 3.2 

Also find moments about mean by using the moments about origin. 
Solution: As width of classes are equal, we use the formula involving h. First the 
moments about origin a = 98 are calculated as follows: 


porns Sicd bxeaee 
2 


oe 





We know that: 


» _ WEL u; 
here h = 5, n=>f,= 
m’, =h(S f,u,)/n 


= 5(-1)/30=— 0.1667 
m = W(S f,u2 in 
= 5? (55)/30=45.8333 
m, =h(S f,u) Vn 
=5° (5)/30=20.8333 
m’ =h'*(S f,us Vn 
= 5* (235)/30=4895.8333 


The moments about the mean are 





m,=m' —m =0 

m,=m' — (m’ ‘3 
= 45.8333—(—0.1667)? 
= 45.8055 

m, =m —3m' m’ + 2(m’ a 
=20.8333—3(45.8333)(—0: 1667) + 2(—-0.1667)° 


= 20.8333 + 22.9167 —0.0093 
= 43.7407 


m, =m’ —4m’ m’ + 6m’ (mm)? —3(m’ y* 
= 4895.8333 — 4(20.8333)(—0.1667) + 6(45.8333)(—0.1667) — 3(-0.1667)" 


= 4895.8333 + 13.8916 + 7.6419 — 0.0023 
= 4917.3645 


The first four moments calculated from the same data in ungrouped form and 
grouped form are slightly different. This is because of the assumption that each 
observation in a class is equal to mid point of that class while grouping the data. 


4.3.4 Moment about zero 


If the variable y assumes n values y,, y,,¥,,...-, y, then 7th moment about 


zero can be obtained by taking a = 0, so, for relation 4.29 
yee 
nm = = 
‘ n 
Putting r = 1,2,3 and 4 we get 


% z 
et mi yy 











m = = 
; n n 
3 4 
#3 stom y fy Bed? 
m =] m — 
2 n ‘ n 


m’ sm’ ,m’ and m are the first four moments about zero. 


For frequency Distribution, the raw moment about zero are given by 


4.4 
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m yf m, st 
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m= 


yoke lei foimdinigty i 


Sheppad’s Correction for Grouping Error 





In case of grouped data, we proceed to calculate first four moments by replacing 


all the members of a class by the mid value of the respective class. The choice of 


class boundaries and the mid values, affects the values of our approximation to the 


first four moments. The first and third moments are not affected that much as the 


second and fourth moments because, in case of second and fourth moments all the 


deviations become positive and the grouping error accumulates. Sheppard has 


suggested the following corrections for second and fourth moments in case of 
grouped data where the frequency curve of the grouped data approaches to the base 
line gradually and slowly at each end of the distribution. 


4.5 


9 


corrected m, = m,-— (4.44) 
frsind 3p 
h? aa 
corrected m, =m,——-m, +——h (4.45) 
af | 2a 


For the example 4.19 corrected m and m, are: 


corrected m, = 45.8055-—— = 43.722 


corrected m, = 4917,3692-(45.8055)+ (5* )}=4363.0296 


ries 
240 
Skewness 


The word skewness means lack of symmetry of a distribution. A symmetrical 


distribution is one in which mean, median and mode are identical and the portion of 


frequency polygon to the left of the mean is the mirror image of the portion to the 


b1i4 


right of the mean. If a distribution is not symmetrical, it is called skewed or 
asymmetrical. 

The figure 4.1 shows the three types of distributions, i.e., the symmetrical 
distribution in figure 4.1(a), positively skewed distribution in figure 4.1(b) and 
negatively skewed distribution in figure 4.1(c). A positively skewed distribution is 
one whose tail extends to the right hand side and a negatively skewed distribution has 
longer tail towards left hand side. The positions of mean, median and mode are shown 
in figure 4.1 

A: symmetrical 


gots: 


Mean median mode 
B: positively skewed 


Mode mean 
median 


C: negatively skewed 


Mean mode 
Figure 4.1: Three types of distributions. 
The skewness may be very extreme and in such a case these are called 
J-shaped distributions. This is shown in figure 4.2. 


Figure 4.2: J-shaped distributions. 
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One of the numerical measures used to know about the symmetry of a 
' distribution is VB, (B, is a Greek letter read as beta one) andis defined as: 


(4.46) 


iB. = 


It is dimensionless quantity. If VB, = 0, the distribution is symmetrical. If 
vB. < 0, the distribution is negatively skewed and for VB, > 0, the distribution is 
positively skewed. The population parameters are estimated from the corresponding 
sample statistic. The estimate of ./B, is denoted by /b, and is defined as: 


jo, =— (4.47) 
im; 
It should be noted that these sample statistics tell us only about the particular 
data set under consideration and not for the whole population. 
Some other measures of skewness are: | 
a) Karl Pearson’s first co-efficient of skewness 





Mean - Mode (4.48) 
S 
b) Karl Pearson’s second co-efficient of skewness 
3(Mean le (4.49) 


These coefficients are pure numbers and these are zero for 
symmetrical distributions, negative for negatively skewed distributions 
and positive for positively skewed distributions. ; 


c) Bowley’s coefficient of skewness based on quartiles 


Q, +Q, —2 median 
Q, -Q, 
It is a pure number and lies between —1 and +1. For symmetrical distributions 
its value is zero. 


S, = (4.50) 


a 


The skewness of the distribution of a data set can easily be seen by drawing 
histogram or frequency curve. 
Example 4.20: The heights of 100 college sitio measured to nearest inch are 
given in the following table: 


Height | 60-62 63-65 66-68 
Te eee 

Calculate co-efficient of skewness. 
Solution: 



















62.5 — 65.5 
65.5 — 68.5 
| 68.5 -71.5 92 


| AL Si 74,5 | 100 


peer lq 
59.5 ee Bs 

| 

| 





288 









= 67 + Ae 67.45 inches 
100: 








te a. per fere) - | 30 } 


873 (45 ¥ 
= cai = 2.92 inches 
100 (100 
moasalitfeay i. Dsus Sontifate to tas fa 
f Fa Peer 


(42-18 + (42- ay 
= 67.35 inches » 
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Mean - Mode 
Standard deviation 


_ 67.45 - 67.35 _ 0.034 
2.92 
Example 4.21: The weight of 38 male students at a university are given in the 


following frequency table: 


7 eli Spee 2 sis | 


Calculate Bowley’s co-efficient of skewness. 


Co-efficient of skewness = 









Solution: 
118 —- 126 3 | 117.5- 126.5 
127 — 135 126.5 — 135.5 
136 — 144 135.5 — 144.5 
145 — 153 144.5 — 153.5 
154 — 162 153.5 — 162.5 


163 — 171 162.5 — 171.5 


nA 
4 

= 135.5+— a0. 
12\ 4 





Ul 
— 
we 
a 
ON 
N 


i) 

Il 

~ 

+ 
Sy | 
PI? ae 
al? 

| 

ie) 
pe 


Median =Q, ee 


144542 2 sh 17 
12 “ 


= 146 


Q, +O, -2Median 
Q,-Q, 
153.13+ 136.62 —2(146) 
SAS .~ o 
25 


meee 
~0.139 


. Bowly’s co-efficient of skewness = 





4.5.1 Kurtosis 


The word kurtosis is used to indicate the length of the tails and peakedness of 
symmetrical distributions. Symmetrical distributions may be platykurtic, mesokurtic 
(normal) or leptokurtic. 

The mesokurtic is the usual normal distribution. Leptokurtic is more peaked 
and has many values around the mean and in the tails away from the mean whereas 
platykurtic is bit flat and has more values between the mean and tails as compared to 
the mesokurtic (normal) distribution. Figure 4.3 shows these shapes of distributions. 


—> Meso-kurtic 


—> Platy-kurtic 


Figure 4.3: Mesokurtic, platykurtic and leptokurtic distributions 
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The leptokurtic distribution may be composite of two normal distributions 
with the same mean but different variances. The platykurtic distribution may be the 
composite of twonormal populations with the same variance but different means. 

The dimensionless measure of kurtosis based on the moments is f, and is 
defined as: 


- Hs (4.51) 





If £, = 3, the distribution is mesokurtic (normal). If £, < 3, the distribution is 
platykurtic and if 8 , > 3, distribution is leptokurtic. If this measure is calculated by 
using sample moments then 

m e 
b,=—> (4.52) 
2 

Here, b, is the estimate of £, 

Another measure of kurtosis not widely used is Percentile co-efficient of 
Kurtosis, and is denoted by K. 

Kz Q.D (4.53) 
Eo- Fy 

Where, Q.D. stands for quartile deviation and P,, and P,, are the 90th and 
10th percentiles respectively. K is 0.263 for a normal distribution and lies between 0 
and 0.50. 

Example 4.22: Calculate Jb, b, forungrouped data of theexample 4.18 andfor | 
grouped data of the example 4.19. 
Solution: 

For ungrouped data: For ungrouped data of the example 4.18 we have, 


m, = 43.4988, m, = 17.3354, m, = 4131.1478 





m, 
17.3354 


(43.4988) 


= 0.0604 





2 


. 4131.1478 


(43.4988) 
2.1833 


For grouped data: For grouped data of the example 4.19, we have 


m, = 45.8055, m, = 43.7407, m, = 4917.3645 
m 43.7407 
Je . = = = See = 0.1411 
Jn (45.8055)? 
4917. 
m; (45.8055) 


4.1 What do you understand by dispersion? What are the most usual methods of 


measuring dispersion, indicate the advantages and disadvantages of these 
methods? 


4.2 Define mean deviation and its co-efficient. Discuss its advantages and uses. 
4.3. i) Whatis semi-inter quartile Range. 
ii) Define range and discuss its uses. 


4.4 Explain the difference between absolute dispersion and relative dispersion. 
Describe the properties of the standard deviation. 


4.5 i) Define various measures of dispersion and given their formulae. 


ii) The following table gives the marks of students: 


Marks | 039 | Oa | 5059 | OO | 7079 | 
if | ete 


Calculate: 


a) Quartile deviation b) Co-efficient of skewness 
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4.6 Calculate Mean Deviation from Mean, Mean co-efficient of dispersion and 
variance from the data given below: 


Weights) Weights (ka) 





also. calculate Range, Quartile Deviation, and co-efficient of Quartile 
Deviation. 


4.7 Calculate quartile deviation for the data given below: 


[25-150 
Frequency fox FF tale MBuiafetc Wa ena fy 


also calculate co-efficient of Quartile Deviation. 


4.8 Calculate standard deviation, variance and co-efficient of variation from the 
following data: 





4.9 The mean of a set of 10 values is 25.2 and its standard deviation is 3.72 for 
another set of 15 values mean and standard deviation are 25.2 and 4.05 
respectively. Find the combined standard deviation of the 25 values taken 
together. 


4.10 For a group of 50 boys, the mean score and the standard deviation of scores on a 
test are 59.5 and.8.38 respectively, for a group of 40 girls, the mean and 
standard deviation are 54.0 and 8.23 respectively on the same test. Find the 
mean and standard deviation for the combined group of 90 children. 


4.11 By multiplying each number 3, 6, 1, 7, 2, 5 by 2 and then adding 5, we obtained 
11, 17, 7, 19, 9, 15. What is the relationship between standard deviation and 
means for the two sets of numbers? 


4.12 The scores obtained by 5 students on a set of examination papers were 70, 50, 
60, 70, 50. Their scores are changed by 
i) adding 10 point to scores ii) increasing all scores by 10%. 


What effect will these changes have on mean and on standard deviation? 


4.13 Compute the mean wages and the co-efficient of variation for the employees 


working in two factories are given in the following table: 
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4.15 Calculate mean deviation (about median) for the distribution given below: 
| | 
100 — 110 
110-120 
120 — 130 
130 — 140 
140 — 150 


150 — 160 
160 — 170 
170 — 180 
180 — 190 
190 — 200 





4.16 Calculate standard deviation by using arithmetic mean and also by using any 
provisional mean and compare the results for the data given below: 


OST PSS) W237 


4.17 A manufacturer of television tubes produces two types A and B of tubes. The 
tubes have respective mean life times as X, = 1496 hours and 


X,, = 1895 hours and standard deviations S4= 280 hours and 
Sg= 310 hours. Which tube has the greatest: 


i) absolute dispersion ii) relative dispersion. 
4.18 i) What are moments about mean and about an arbitrary value. Give the 
relation between them. 
ii) Define the moment ratios b; and bp. 
4.19 Computer calculated mean and standard deviation from 20 observation as 42 
and 5 respectively. It was later discovered at the time of checking that it had 
copied down two values as 45 and 38 whereas the correct values were 35 and 


58 respectively. Find the correct value of co-efficient of variation. 


4.20 


4.21 


4.22 


4.23 


4.24 


4.25 


4.26 


4.27 


4.28 


aa 


A distribution consists of 3 components with frequency 100, 120-and 150 
having means: 5.5, 15.8 and 10.5 and standard deviations: 2.4, 4.2, and 3.7 
respectively. Find the co-efficient of variation for the combined distribution. 


Calculate first four moments about mean for the following set of examination 


marks. 
45, 32, 37, 46, 39, 36, 41, 48 and 36. 


Calculate first four moments for the following distribution of wages about 


y=10. Find moments about mean 
Herning’ |S) © | P| eater ott | epi) #4 | 1 
msrp ays fs [T 


First three moments of a distribution about y = 4 are 1, 4 and 10 respectively. — 









Find co-efficient of variation. Is the distribution symmetrical or positively 


skewed or negatively skewed. 


First four moments of a certain distribution about y = 17.5 are 0.3, 74, 45 and 
12125 respectively. Find out whether the distribution is lepto kurtic or 
platyckurtic. 


What can you say of skewness in each of the following cases: 
i) Ox= 26.01; 03=38:292 Oi 43 33 


ii) Mean =1403 and Mode = 1487 


Given ‘that Lf = 76, Dfy = 572; Yfy = 4848, Lfy' = 44240 and 
Yfy* = 42580. Find first four moments about mean and test the distribution for 
symmetry and kurtosis. 


If distribution has mean 1403 and mode 1487, what can you say about the 
skewness. 


Lower and upper quartiles of a distribution are 142.36 and 167.73 respectively 
while median is 153.50. Find co-efficient of skewness. 


4.29 


4.30 


4.31 


4.32 


4.33 
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The daily income of employees range from Rs.0 to Rs.18. They are grouped in 
intervals of Rs.2 and class frequencies from the lowest to the highest class are 5, 


39, 69. 41, 29, 22, 16, 7 and 5. Find the co-efficient of skewness. 


First four moments of distribution about x = 2 are 1, 2.5, 5.5 and 16, calculate 


mean and co-efficient of variation. 


Find moments about mean f; and fb. Given the first 4 moments about y = 20 as 


2, 15, -25 and 80 respectively. 


What is meant by skewness and kurtosis. What aspects of the frequency curve 


are measured by them. 


Second moment about mean of two distributions are 9 and 16 while fourth 


moment about mean are 230 and 780 respectively, which of the distribution is 


i) Leptokurtic ii) Platykurtic 


4.34 What can you say about skewness in each of the following-cases? 


4.35 


4.36 


i) Median is 26.01 while two Quartiles are 13.73 and 28.29. 
ii) Mean = 140 and Mode = 148.7. 
ili) First three moments about 16 are 0.35, 2.9 and 1.93 respectively 


i) The second moment about mean of two distributions are 13.76 and 
63.0 while the fourth moments about the mean are 528.06 and 9500 
respectively. Which of the distributions is 
a) Leptokurtic b) Mesokurtic c) Platykurtic. 


ii) The fourth central moment of a symmetrical distribution is 243. What 
would be the value of standard deviation for which distribution is 


mesokurtic? 


The second moment about mean of a distribution is 25, what would be the 
value of fourth moment about mean if the distribution is 


i) LeptoKurtic ii) MesoKurtic iii) PlayKurtic. 


126 





4.37 Which of the following is correct for a negatively skewed distribution; 
i) A.M. is greaterthan mode. ii) A. M. is less than mode. 
iii) A. M. is greater than median. 
4.38 What would be the shape and the name of the distribution if 
i) Mean = Median = Mode ii) - Mean> Median > Mode. 
ili) Mean < Median < Mode iv) B, = 0 and fp, =3 
v) B, =Oand h=5 


4.39 Against each statement, write T for true and F for false statement. 


i) The sum of squares of the deviations for a data set from the median is 


minimum. 


il) The sum of absolute deviations for a data set from the mean is 


minimum. 


iii) If each of the observations in a data set is multiplied by a constant, the 


variance of the resulting data set increases. 


iv) If a constant is added in each of the observations in a data set, then the 


variance of the resulting data set increases. 


v) Standard deviation is a positive square root of variance. 


vi) | Range is a measure of absolute dispersion. 

vii) A relative measure is independent of unit. 

viii) Mean deviation is not based on all the observations. 

ix) Semi-inter quartile Range is also called inter quartile range. 


x) The standard deviation is dependent upon origin. 


xi) 


xii) 


xiii) 


xiv) 


XV) 
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Co-efficient of variation is a Relative measure of dispersion. 
The first moment about mean is one. 
If b;=3,the distribution will be symmetrical. 
Mean Deviation is always less than the standard deviation. 


The normal Distribution is also known as Mesokurtic Distribution. 


4.40 Fill in the blanks. 


i) 


ii) 


iii) 


iv) 


v) 


’ vi) 


Vii) 


Viil) 


ix) 


x) 


A measure of dispersion is 


A measure of dispersion expressed as a co-efficient is called 


measures of dispersion. 
Sum of absolute deviations are minimum if computed from 


The value of standard deviation does not if a constant is 


added or subtracted from all observations. 

Co-efficient of variation is always from unit of 
A data having least C.V. is considered more 

The lack of symmetry is called 


In a symmetrical distribution, the quartile deviation is __ from 


the median on both sides. 
Shepherd correction is applicable when the frequency distribution tends 
to in both directions. 


A relative measure of dispersion is the between absolute 


dispersion and the average. 


Index Numbers. [= 1g 


ct as CA) ae i e 


\° fe 





5.1 Introduction 


The buying power of a rupee varies from time to time as the amount of a 
commodity. One could buy by 10 rupees in 1960 now costs about 60 rupees in 1995, so 
to make meaningful comparisons overtime it is necessary to take into account the 
variability in the buying power of a rupee. For example to compare the cost of 2-year 
college education today with its cost in 1960, it is necessary to consider the buying 
power of a rupee today as compared with the buying power of a rupee in 1960. 
Similarly one may be interested to know the average hourly wages of a labourer in 
1995 as compared with their wages in 1960. The numbers known as Index numbers 
are computed for this purpose and measure the relative change in a variable overtime. 
These are usually constructed for the variables such as prices, quantities, wages, 
investment and cost of living to help governments, economists and business people. 


An index number is a ratio or an average of ratios usually expressed as a 
percentage. To construct index numbers two or more time periods are considered. 
The values at one of the time periods are taken as a base. This ratio of the values at 
the other time periods to the base period when expressed as a percentage show 
percentage change in the value from the value of the base period. The index number 
is usually denoted by J, and is calculated by the relation: 


, = ax 100 | ee 6.1) 
price in base year 


We will use the following notations in this chapter. 


Po; = Price of the ith commodity in the base year. 
Jo; = quantity of the ith commodity in the base year. 
Pon = Price of the ith commodity in the current year. 


on 
Pon 


i 
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quantity of the ith commodity in the current year. 
price Index number for current year . P,, means index number for the year 


next to base year and so on. 


Q,,, = Quantity Index number for current year. Qj, means index number for the year 


next to base year and so on. 
I, is also used to denote Index number for current year in the literature. 


The index number for the base year is always taken as 100. 


Example 5.1: The data given below is available about the price of wheat for the 
years 1989 to 1994, The interest is to compare the price of wheat in these years 
taking 1989 as the base year. 









1989 1991 | 1992 1994 
ee 





co 





Solution: 


The price index for each year is calculated by the ratio. 


price in current year 


aed = ——————-* 100 


, price in 1989 
The index number for 1989 is the ratio of price in 1989 to the price in 1989 


expressed as a percentage i.e., 


(85/85)(100) = 100 
The index number for 1990 is 
(96/85)(100) = 112.94 and so on. The price indices are 


Price - Price index 
(1989 base) 






ES 


The price index column indicates percentages of 1989 price for each year. For 
example, ‘the price in 1992 is 145.88% of the price in 1989. So, the price of wheat is 
45.88 % higher in 1992 as compared with wheat price in 1989. 


5.1.1 Types of index numbers 


Index numbers are generally classified into the following two types. 
1. Simple index numbers. 
ii. Composite or aggregate index numbers, 


Simple index number 


An index number is called a simple index number when it measure a relative 
change in a single variable with respect to a base year. For example index numbers 
for wages of labourers, index number of wheat prices and index number for the 
volume of a commodity (produced, purchased, sold, consumed etc.) overtime. In 
example 5.1 above, simple index numbers have been calculated. 

If we are calculating price index (P,,) then the formula is 


pis price in year n 


on 


— x 100 (5.2) 
price in base year 


Similarly, wage index and other indices are calculated. 


Example 5.2: Compare the daily wages of unskilled labourers in Lahore over the 
time period 1988-93 where the following data is available from the Pakistan 
Economic survey 1993 taking 1988 as base year. Wages are in rupees. 


er [ee [ae [io [wr | [os 
fwoesf 6 fa fe [apm | « 


Solution: The daily wage index J, for each year is the ratio 











daily wages in yearn 


Won = x 100 


daily wages in year1988 
Using this formula and taking 1988 wages as the base, the wage index 
numbers are calculated for each year and are given below in the last column. 
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Wane Wages Wages index 
(Rupees) (1988 base) 





The wage index of 126.08 for 1990 indicates that wages have been increased by 
26.08 percent as compared to base year 1988. Similarly, the wages in 1993 have been 
increased 86.96 percent. 


Composite or aggregate index numbers 

An index number is called a composite (aggregate) index number when it 
measures a relative change in two or more variables with respect to a base year. For 
example index numbers for comparing two sets of prices from a wide. variety of 
commodities, index numbers for comparing two sets of the quantities of the 
commodities from a wide variety of commodities. These are calculated in following 
two ways: 

i. Unweighted index numbers 

ii. Weighted index numbers 


The unweighted and weighted indices may measure changes in price, quantity 
or value of a commodity. Accordingly these may be 


i Price index numbers 
b. Quantity index numbers 
C Value index numbers 


(a) and (b) are discussed in article 5.4 and 5.5. 


The value index number denoted by V,,, is ft x 100 (5.3) 


o o 


where 


<p, 7, = total value of all commodities in a given year. 


es 





<p, 4, = total value of all commodities in the base year. 


The simple and composite indices are discussed in detail under headings 5.4 
and 5.5. 


5.1.2 Limitations of index numbers 


p13 


Some limitations of index numbers are given below: 


2 


a Teale 


It is not possible to take into account all changes in product. 

All index numbers are not suitable for all purposes. 

There may be errors in the choice of base periods. 

These are simply rough indications of the relative changes. 

Different methods of construction of index numbers give different 
results. 


Use of Index numbers 


i) 


il) 


iii) 


iv) 


Vv) 


vi) 


Vii) 


The price index numbers are used to measure the average price 
changes in a commodity or a group of commodities with the passage 
of time relative to the base period. This helps in comparing prices of 
one commodity with another. 


The price index numbers are used to measure the buying power of the 
money. 


The consumer price indices (CPI) are used as a factor to cancel out the 
effect of inflation or deflation by the governments as these measure the 
changes in the prices of consumer goods. 


The wholesale price index numbers (WPI) help in the adjustment of 
contract prices and payments by industrial organization as these 
measure changes in producer’s selling prices. 


The quantity index numbers are used to measure changes in the 


quantities produced, a consumed, purchased, sold, exported or 
imported. 


The index numbers are helpful for the economists and the businessmen 
to describe the existing conditions and help plan near future. 


The index numbers of import and export prices help to measure charges 
in terms of trade of country. 
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5.2 Construction of price index numbers 


The construction of different index numbers involves the following main 


steps. 


Purpose and scope 


The first step is to define the purpose of index numbers. It should be 
clearly mentioned why, where and what changes are to be measured. 


Selecting components 
i. Commodities. il. Price of commodities. 


As an index number is constructed to represent a particular purpose. So, 
it is important to select the commodities to be included keeping in 
view the cost of collecting data. The items should be precisely defined 
in terms of quality specifications and relevant data should be readily 
available overtime as some items become obsolete with the 
introduction of new products. For practical purposes, the number of 
items selected should not be less than twenty. The prices of these 
items should be collected from different places keeping in view the 
quality of the items. The sampling should be carried out with care as 
the posted or listed prices are not the retail prices sometimes. 


Choosing the base year 


The purpose of constructing index numbers is to make comparisons. So, 
the base year should be chosen with care to be a year of normal prices 
and should not be a year too far from the current year. It is also 
possible to use an average of prices for several years to act as a base 
year e.g. it may be an average of 3 years. Usually base year is taken 
fixed because of comparison purposes but it may be taken as variable 
in case we calculate what we call “link relatives” discussed under 
5.4 (iii) 


Choosing the weights 


The weights chosen should indicate the relative importance of various 
commodities to be included in the construction of an index. As all the 
commodities selected are not equally important so they should be 
given different weights. For example wheat should be given more 
weight as compared with tea. 
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The weights chosen for the price index may be the quantities of the base year 
or current year. It depends upon the situation what weight to be used. The quantities 
may be quantities produced or quantities marketed because in agricultural economics, 
a high proportion of the food produced is often consumed by the families themselves 
and not sold in the markets. 


The weights may be formulated as follows. 
Let W,, denotes the weight for the ith commodity in the base year. 
V,, is the value of the marketed or produced commodity in base year then 


Value of the commodity = price x quantity 
a = EB, Qi 
The weight 


(atm (5.4) 


Here, XV ,, is the total value of the k items in the base year 
e Choosing the average 
An index number can be constructed as average of ratios according to the 


definition of index number. For example, consider k commodities then the 
arithmetic mean of k ratios each computed for single commodity is 


oP. 
py eee 
i=l Pe 
| Spe og (5.5) 
Other averages such as geometric mean, harmonic mean or median can also 
be used instead of arithmetic man. Although geometric mean is more appropriate for 
averaging ratios but in practice arithmetic mean is used for convenience. 


5.3 Unweighted index numbers 


The idea is to give equal weights to each item in the index. The unweighted 
index numbers are computed in the following two ways: 
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i. Simple aggregate Index 
It is the ratio of the sum of prices (quantities) of commodities for a given year 


to the sum of prices (quantities) of the same commodities in the base year, expressed 
as a percentage. 


The price index denoted by P,,,, in calculated by the formula 


k 


=P, | 
Pi=2"x100 = ZF 100 (5.6) 
rp Le, 


i=l 0 


where k is the number of commodities P, and P, are the prices of the commodities 
in the current and the base years respectively, n denotes the current years, 0 denotes 
the base year and i stands for the number of commodities. 

These indices suffer from drawback that changes in the measuring units may 
affect the value of the index which ultimately lessen their usefulness for making 
meaningful comparisons. Secondly these indices use equal weight for all 
commodities whereas all commodities are not equally important. 


ii. Average Relative Index 


It is the average of simple index numbers calculated individually for each 
commodity. For k commodities it is calculated by the relation. 


k 
ap = 34 (5.7) 


The simple index numbers P,/P, are also called price relatives. Thus 
average index is the average of price relatives. Usually, the average used is arithmetic 
mean but other averages such as geometric mean or median may be used. 

These indices suffer from the drawback that these don’t use weights for the 
different commodities according to their importance but changes in the measuring 
units don’t affect the index value. 


Example 5.3: Calculate . the unweighted price index for 1994 when the procurement/ 
support prices of agricultural commodities in rupees per 40 Kg in 1980 and 1994 are 








? 
Commodity 
Wheat 


Rice 
Potato 


Onion 





Solution: 
i. Simple aggregate index 
The simple aggregate price index for 1994 is 


Pa, _ 160+360+19+84 623 »P,i 
= '58+118427480 ~ 100 = 9g3 *100= 22014. =;— x100 


This indicates that the prices of the above 4 commodities in 1994 are 120.14% 
higher than they were in 1980. 


ii. Average relative index 


The simple price index number for wheat is given by 


Ps te 7586 or 275.86% 
58 
The simple price index for rice is given by 
Pox 360 _ 3.9508 or 305.08% 
118 
Similarly, the index numbers for potato and onion are 


a =0.7037 or 70.37% and =. -O5 or 105% respectively, 


So, the average relative index for commodities is given by 
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using arithmetic mean as average 


2.7586 +3.0508+0.7037 +1.05 
4 


This index indicates that the prices are 89.08% higher in 1994 as compared 
with 1980. 


using median as average 


P= =1.8908 or 189.08% 


One can use the median as average, then the median value of these 4 values is 
obtained by arranging them in ascending order as follows: 


0.7037, 1.05, 2.7586, 3.0508 


As 4 is an even number so the median is the average of middle two values i.e. 
(1.05 + 2.7586)/2 = 1.9043, so our index number is 


1.9043(100)% = 190.43% 


3 


using geometric mean as average 








G.M. = Antilog (2 log Y/n) 
= Antilog ( 0.7937/4) 
= Antilog (0.1984) =1.5791 


So, the index is 
(1.5791) (100) = 157.91 


In this example, wheat and onion had equal weights but clearly wheat is 
produced more as compared with onion, so it is usually recommended to use weights 
in the index proportional to the value of the production of each item. 





as 


Example 5.4: Calculate index numbers of price, using 1962 as base 


i) Mean ii) Median are used. 


Commodities 
Firewood Softeake ' Kerosene oil 





Solution: 

















Price relatives ; 
Soft- | Kerosene 
oil 


| Index numbers | numbers 
100 400 
428 
413 
440 


107 
103 





108 
104 
108 








Method for finding the average relative Index number. 


With the introduction of new commodities and quality goods in the market, 
the tastes and habits of the people change overtime. This brings change in the relative 
importance of the commodities. In such situations, it becomes necessary to change the 
base year and a quantity called Link relative is calculated instead of price relative 
for each year. The link relative is a quantity computed by taking the price of the 
previous year as a base. These are also expressed as percentages like price relative. 


ink pengees See res Sipe 


Price of an item for previous year 


P 
=——*_ x 100 





n-1 


Link relatives are not directly comparable because they have no fixed base. 
To make them comparable, they are converted to what we call Chain indices. The 
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chain index for the first year is taken as 100 and then the chain index for succeeding 
year is obtained by multiplying its link relative with the chain index of the proceeding 
year and dividing the results by 100. One can note that these chain indices are just the 
price relatives computed by taking the first year as a base. 


Advantages of chain base methods 
This method is rarely used because the price relatives give the same results. 
However, it has certain advantages. 
i) Link relatives are useful to make year to year comparisons. 
ii) New items can be substituted for old items provided the number of 
items remains the same. 
ili) The weighing system can be changed according to the apc in the 
relative importance of different items. 
iv) Changes in the Geographical coverage can be accommodated: 


Example 5.5: Compute link relatives and chain index for the data of the example 5.2. 


100 100 


>-x100=110.87 110.87x100 
100 
58 
= 100=113.73 113.73x110.87 
100 


= 110.87 


=126.09 


100 


od a 100 «154.35 = 154.35 


=154.35 


100 


86 
FEO IAS 9) ok anA54.95 


100 


=186.96 





Example 5.6: Find chain index numbers for the price data given below. The price of 
the commodities are in Rs. per 40 Kg. 









a 
Commodities 


haste Rice | Potato | Onion _ 


1980 Paras 118 


cs a 
rie [aso 30 fs 
Fase | 90 | 10 | a0 | 100 


Solution: First we compute link relative, by the relation 










Link relative = = 


n-l 








The link relative for wheat for the year 1980 is taken as 100, for 1981 it is 
60/58 x100 = 103.45; for 1982 it is 75/60 x 100 = 125.00 and so on for all the 


other commodities and are ave below: 


Year 
Piet [WS Powe | Onion | Tolan 


A 


Pon [sa [ions [an [es [a [ora 
rom [aso [vos [ioe [ss [ano [097 





The chain index for 1980 is 
100; The chain index for 1981 is 
obtained by multiplying 100 with the 
link relative of 1981 and dividing it by 
100; iie., (107.19)(100)/100=107.19; 
the chain index for 1982 is 
(109.72)(107.19)/ 100=117.61 and so 
on. These are given in the adjoining 
table: 





100 










107.19100.00 
100 


=107.19 








109.72 x107.19 
100 


=117.61 


118.49 x 117.91 
100 


=139.36 
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5.4 Weighted index number 


In the weighted index numbers the different commodities in the combination 
receive their weights proportional to their importance. These are calculated by the 
following two types: 

5.4.1. Weighted aggregate index 

It is the ratio of the sum of weighted commodity prices (quantities) to the sum 
of weighted commodity prices (quantities) in the base year, expressed as a 
percentage, the weights being the corresponding quantities (prices). 


The price index denoted by P. (weighted) is calculated by the following 


on 


formula. 


P ioe Sy 406 Le oie) 


where, p,,, p,; are the prices of the commodities in the current and base year and 
Ini> Toi ate the corresponding quantities respectively. 

Weighted index numbers are of various kinds. The most common are 
discussed below: 
a. Laspeyre’s Index Number 

It was named after the name of an economist Etienne Laspeyres who introduced 


it. Itis denoted by P,, and is calculated by the following formula. 


Index number for prices 


k 
ba Pi Qi 
i=0 
P, =~—— x 100 | (5.9) 
¥ ba Poi Goi 
i=0 


Here the weights are base year quantities and the idea of using base year 
quantities as weights for current year prices is that the base year quantities don’t 
change overtime. This is true for every day consumer commodities but for others the 
increase in price is followed by the decrease in the quantity consumed so more 
weights are given to the commodities whose prices have increased. 





BEY 


Index number for quantities 


k 


24ni Po 





On FT (100) (5.10) 


24D oi 
Here the weights are base year prices. 


b. Paasche’s Index Number 


It is a weighted index and unlike Laspeyre’s index, it uses current year 
quantities as weights rather than base year quantities for the price index. It is denoted 
by P,,, and is calculated by the formula 


k 


z ni ni 
p, =i!" 4" 499) (5.11) 


Zp, Vn 


Unlike Laspeyres index this index gives less weight to the commodities whose 
prices have increased. jst the quantity index number is 


¥ ni Pri 
Q,, =—~——— (100) (5.12) 
> Voi Pri 
I=] 
c. Fisher’s ideal Index Number 


This index number uses the fact that Laspeyres index gives more weight to 
commodities whose prices have increased and Paasche’s index gives less weight to 
commodities whose prices have increased so, there should be an index number that 
should be in between these two index numbers. This aim is achieved by taking the 


geometric mean of these index numbers. So, the Fisher’s index number P,,, is given 


by 


P.,,=./Laspeyre xPaasche £15.13) 


bad 2PriQoi Pri Vni x 100 (5.14) 
ZP i Toi | LP ot Wri . 
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Example 5.7: Complete index numbers from the following data using 1964 as base. 
i) Laspeyre’s Index. ii) Paasche’s Index _ iii) Fisher’s Index 


for the following data using 1964 as base. 


Heese) Sean ARO, 
tems pc Gama Re [Om 
10 12 i Pe 15 
9 15 5 20 
©) 24 9" 20 
10 3 14 5 


Red pi a = lepers oa 
Bf 9 fs | 5] 20m] 135] 25: | 100: | 80 | 
sa a 





Solution: 













wm 


Ps [af 9 | a | 20 | 216 [eo | 100" 
eee ee 


N 





i) Laspeyre's Index = {Pn 40 x 100 = 20 5.100 =118.8 
Po Io 425 
ii) Paasche’s Index = 222% x 190 
Po An 
530 


= — x100=110.42 
480 


iii) | Fisher’sIndex = [22a do. x 2 Pn Gn x 100 
Rado \22Po% 


1 144 


505 eee x 100 


425 480 
= 114.53 
5.5 Consumer Price index (CPI) and Wholesale Price 


Index (WPI) 


Federal Bureau of Statistics (FBS) calculates the following price indices in the 
country to measure price changes overtime. 


i) Consumer Price Index (CPI) ii) Sensitive Price Index (SPI) 
iil) Wholesale Price Indices (WPI) 


We shall discuss meaning and construction with reference to Pakistan. 


5.5.1 Consumer Price Index (CPI) 


CPI is constructed to measure the aggregate change in the cost of a fixed 
basket of goods and services purchased at current prices with its cost at a given period 
called the base, which is always taken as hundred. It is also called cost of living 
index. 


The CPI was computed for the first time in early 1950’s with base 1948-49 for 
industrial workers in Lahore, Karachi and Sialkot only as a measure of inflation. 
These days, it is being calculated for four different income groups and occupational 
groups in 25 big cities of the country and covers 464 items of consumption in the 
basket of goods and services which represent their taste, habits and customs. 


To construct the CPI, the prices of the items to be included are sampled from 
different locations. The weights to different commodities are given keeping in view 
their importance to make the indices reliable. The weight assigned to each commodity 
is the average percentage expenditure on it to the total expenditure of a family. The 
present weights used are based on the results of a Family Budget Survey conducted 


for this purpose. Due to change in income, taste and other seasonal and geographical 
factors, the weights are different for different income groups, occupational categories, 
cities and various commodity groups. 
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5.5.2 Construction of CPI | 

The construction of consumers price index number involves the following 
steps: 

1. Deciding the category of the people 2. Family Budget inquiry 

a Selection of items 4. Price quotations 

eS Choice of weights 
1. Deciding category of the people. 

The first and the most important step in the construction of consumer price 
index numbers is the decision regarding the category of the people for whom the 
index numbers are going to be constructed. It should be decided before hand and 
whether they are for clerks, industrial, coolies. 

2. Family Budget Inquiry 

After deciding the category of the people an adequate number of families 
should be selected during a normal period. Family budget inquiry is conducted on the 
basis of random sampling. This inquiry would give information regarding, 

i) the qualities and quantities of the items consumed by them. Index 


different heads such as food articles, clothing house rent, fuel 
lighting, education gifts, newspaper, transport etc. 


ii) the retail prices of the items. 
iii) amount spent by various items. 


= Selection of Items 


With the help of family budget inquiry, it becomes easier to select the items 
which should be included in the construction of consumer's price index numbers. 
Only these items should be included which are largely used by that class of people 
and which are not subject to wide variation in quantity, supply and prices. 


4. Price Quotations 


The price quotations should be retail prices and not the whole rate prices. The 
prices should be obtained from the shops, publications of the govt. and official 
reporters of deciding locality. 
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Be Choices of Weights 


All the items that enter into family budget are not of equal importance and 


thus weights must be assigned to theitems. There are two types of weights: 


ii) 


iii) 


a) 


b) 


Quantity Basis: It means the quantity of the different items 
consumed in the base mean. 


Value Basis: It means total values of the items consumed by each 
group. It is calculated by multiplication of the quantities consumed and 
the prices, there are two methods. 


Aggregative Expenditure Method 


According to this method, the quantities consumed in the base year 
used as weights. It is the base year weighted jndex given by 


Laspeyre’s 
~ = ZP, Io x 100 
XP, qo 
Family Budget Method 


This method is the weighted average of price relatives. In this method, 
the family budget of a large number of families are carefully studied 
and the aggregate expenditure of the average family on various items 
is estimated. The amount of money spent by the families concerned 
are calculated from a family budget inquiry. The formula is 


Piste) 100 
ys =Ww 


Where, / = +2100 


o 


W = D.4q, 
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Example5.8: An inquiry into the budgets of the middle class families in a city of 
England gave the following information: 


Rent | Clothing) Fuel 
15% | 20% 10% 


What changes in cast of living the figures of 1929 show as compared to 1928? 





Solution: 





cost of living for 1929 
oy ee =p 98.15 
xW 100 


Interpretation of CPI computed and circulated by FBS 


The following tables are taken from the Pakistan Economic Survey (1993- 
1994) where, CPI has been calculated by the Federal Bureau of Statistics (FBS) for 
items and for different income groups. 





Consumer Price Index 
(on annual basis) 


: % point Contribution 


General 


Food, Beverages 


House rent 

Fuel & lighting 
Household furniture 
equipment 
Miscellaneous 





The table indicates that highest increase recorded during the period is 22.11% 
in the miscellaneous group followed by the fuel and lighting group which recorded an 
increase of 14.74%. The lowest increase was recorded in the household, furniture and 
equipment group. 

Since weights vary across the commodity groups, the highest contribution to 
overall CPI increase has been made by food, tobacco, beverages and group which is 
calculated by computing the ratio 100/49.90 = 2.004 and then dividing the change 
10.40 by 2.004 i.e., 10.40/2.004 = 5.19 and similarly, for the house rent group i.e., 
100/17.76 = 5.6306, then 9.83/5.63 = 1.75 and so on. 


The following table gives the consumer price index for different income 
groups in Pakistan. 


1994/1993 index to overall CPI 
I. Upto Rs.1000 8.27 97.18 
Rs. 1001-2500 
Rs.2501-4500 
Above Rs.4500 









All groups 
(combined) 
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The table indicates that there is an increasing trend in the indices of first three 
income groups starting from 8.27 to 8.91, then the index for 4" income group is 8.67 
whereas the combined index is 8.51. The last column is the ratio of individual group 
to overall for the first group it is 8.27/8.51(100)=97.18; for the second group it is 
8.56/8.51(100) = 100.59, and so on. 


As the CPI is used to cancel out the effect of inflation so these indices suggest 
that the measures should be taken to protect the group I and II as compared with the 
other groups. 


CPI and rate of inflation 


The common way used to measure inflation in Pakistan is through CPI. While 
calculating annual rate of inflation one should compare the current 
year CPI with that of last year. The average annual rate of inflation over a 
longer period of time can be calculated by taking the average of those years. 
The inflation rate during the years mentioned below is computed using the CPI with 
base 1980-1981. The rate of inflation for 1989-1990 is [(177.33-167.23)/ 
167.23] xX 100 = 6.04 and similarly others. 


1988-89 
1989-90 
1990-91 
1991-92 


1992-93 


1993-94 





The annual average rate of inflation for 5 years is 48.76/5 = 9.75% 


a 


The inverse of CPI can be used to measure the purchasing power of money. 
Since the base of CPI is 100 and the Pakistani rupee is also convertible into 100 
paisas, the purchasing power of rupee is [I/CPI] x 100. Through this approach the 
purchasing power of Pakistani rupee in January 1995 as compared to 1980-1981 has 
come down to paisas 33 only. 





' Measurement of purchasing power of money 


5.5.2 Sensitive Price Indicator (SPI) 


SPI is calculated in the same way with the same formula as the CPI but the 
difference is that it includes only 46 essential commodities instead of 464 in the CPI. 


5.5.3 Wholesale price index (WPI) 


This index indicates change in producer’s selling prices and is not an indicator 
of wholesale prices as the name indicates. 


These are computed from the information collected by sampling the 
' producer’s selling prices. Weights are derived on the basis of the value of the 
marketable surplus of commodities available for sale. These are computed with the 
same formulas as for the CPI. The following table gives the wholesale price indices 
of selected items taken from Pakistan Economic Survey (1993-94). 


Consumer Price Index 
; % point Contribution 
General T2371 
Food 
Raw material 


Fuel, lighting and 
lubricants 


















Manufacturers 
Building, material 
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The table indicates that the wholesale price index increased by 12.71 percent. 
The highest increase of 22.65% is recorded in the ‘Lubricants’ group and on the other 
hand, the lowest increase of 6.84% is recorded in the ‘Manufacturers’ group. The 


percentage point contribution in the last column is calculated in the © 


same way as for the consumer price index in the previous table. It should be noted 
that in these indices it is valid to compare adjacent years, such as the value of 106.30 
in 1992 and the value of 110.40 in 1993. The year to year change is 110.40-106.30 = 
4.10 points or 


ON 386% 
106.30 


Exercise 5 Ans on Page 253 


5.1 What is an index number? Give the uses of an index number. 


5.2 Define an index number. Discuss the main points involved in the construction 
of index numbers of prices. 


5.3 Define an index number and describe the different types of index number. 

5.4 Discuss the important problems involved in the construction of index number 
of prices. 

5.5 Compare the following concepts: 
i. Simple and composite index. 
ii. Fixed and chain base method. 

5.6..-—-What-is the weighted index guniber? 


5.7 Find the index number of price from the following data taking average price 
of all years as the base. 


5.8 Given the prices of a commodity per maund for the period 1945 to 1960 as: “ 





lie 

aaT aA Aina olen ef Pew 
1945 1950 24.85 1955 15.65 
1946 18.97 1951 20.90 1956 16.15 















1947 19.70 1952 19.80 1957 20.20 
1948 13.50 1953 23.65 1958 2.29 
1949 15.65 1954 24.55 1959 


Construct index numbers correct to 2 decimal place: 
if 1945 as base. 


ii. Average price of all the year as base prices. 
5.9 Find index number using 

i. 1977 as base 

ii. average of the price as base: . 
Years [piees [Years [Pres [Years [Pew 


5.10 For the following data, find index numbers taking 
i. 1930 as base ii. Average of Ist 3 years as base 
iii. the year 1935 as base 









5.11 The following figures show the wholesale prices of refined petroleum per 
gallon in UK for the year specified. On the basis of 1923=100, construct a 
series of price relatives. 


ee EY 0712) Cabs) Index Numbers 

















5.13. The prices in Rs. Per maund of coal sold during the year 1953-58 as given 


below: Compute index number of price for the year 1953 as base. 


rie [Yas [me [Hew | 
14.95 1955 15.10 1957 16.28 
14.95 1956 15.65 1958 16.28 









5.14 From the data given below, compute, the index numbers of prices, taking 
1962 as base. 












Commodities (Prices in Rs.) 


Yer [inset Soteke | Kowa 








ifs 


5.15 Compute index numbers of prices from the following data taking 1981 as 
base and using median as average. 











5.16 Find chain index numbers (using geometric mean to average the relatives) 
for the following data of prices, taking 1970 as the base year. 


1973. 
| “2 


5.17 The following table gives the average whole sale prices in rupees per unit of 


gold, wheat, cotton during the year 1912-1917. Construct index number with 
1912 as base using 


i) A.M. ii) G.M. 
| 4912 | ine 91 Ce es 91 4 pyplaeh 91 3 piel 91 6 | 1917 | 




















Chapter 5 Index Numbers 


5.19 Construct index numbers for 1963 assuming 1953 as base period by 


i) Laspeyre’s formula Paasche’s formula. 


1953 1963 
Commodities 







5.20 Compute the weighted index numbers for 1964 from the following data with 
1960 as base. 


_ Quantity 


Commodities 





 / aa 


5.22 Construct the following with the help of data given below. 
Fisher’s ideal index taking 1970 as base. é 


Total Production (tons) Harvest Price (Rs) 
Commodities 
Re 


Commodities 


wea ati VS 
re [Gem 
rok 


5.24 Define weighted and unweighted index numbers and explain why weighted 








index numbers are preferred over unweighted index numbers. 


5.25 Calculate Laspeyre's, Paasche's and Fisher's ideal index for the following 
data with 1992 as base. 






Average price (Rs) Quantity (Units) 
1993 
14.15 15.58 10Kg 12Kg 






Chapter Bs: Index Numbers 


5.26 The following figures give the average annual prices in U.K. for beef and 
mutton. 





Construct index number of meat prices giving weights 2 and | for beef and 
mutton respectively. Take the year 1935 as the base year, 


5.27 The following table shows the average price in rupees for wheat, rice and 
barley. 





Taking 1980 as base year, construct price index number by weighted average 


of relative method for the year 1981 using the weight 20 for wheat, 12 for rice 
and 4 for barley. 


5.28 An inquiry into the budgets of the middle class families in a city of England 
gave the following information. What change in cost of living the figures of 
. 1929 show as compared in 1928. 


























Expenses Clothing Fuel Misc. 
20% 10% 209 oe 


on 
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5.29 _ Fill in the blanks: 


5.30 


i) 
ii) 
iti) 


iv) 


v) 
vi) 
vii) 


Vill) 


ix) 


x) 


The changes in whole sale or retail price are studied in 


The volume or quantity of goods are compared by 


In both quantities and prices are used. 

Index number are used for business activity and in ~ 
discovering fluctuation and business 

The purpose of index number may be or 

The two method of selection base periods are and 


The base period in fixed base should be 


process is must in method for comparison 
purpose. 
Geometric mean is a suitable average in method. 
Un-weighted indices are classified into simple indices and 
simple 


Against each statement, write T for true and F for false statement. 


i) 
ii) 


iii) 


iv) 


v) 


vi) 


Vii) 


Vili) 


ix) 


Six steps are involved in the construction of index numbers of prices. 

In price relative, the given year price is divided by the base year price. 
Laspeyre’s Index number is also named as current year weighted index 
number. 

Fisher’s Index number is the Geometric Mean of the Laspeyre’s and 
Paasche’s Index number. 

Aggregate expenditure method and Family Budget method are the 
types of the consumer price index number. 

The Index numbers are calculated in percentages. 

Index numbers are statistical barometers. 

In chain base method the base year is fixed. 

The most suitable average for Index numbers is Harmonic Mean. 


The smaller the size of the sample, the greater would be the accuracy. 
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6.1 Introduction 


If you bought seven tickets for a raffle out of 700 tickets sold altogether. Each 
of 700 tickets is as likely as any other to be drawn for first prize. you would say that 
you had 7 chances out of 700, ora single chance out of 100 for winning the first price. 


Probability gives us a measure for the likelihood that some thing will happen. 
However, probability cannot predict the number of times that an occurrettice actually 
happens. Most of the decisions that affect our daily lives are based upon likely hood 
and not on absolute certainty. 


In this chapter, we shall ere methods to deal with problems concerned with 
chance events. 


Some definitions, terminologies and notations are explained below to enable 
the student to certain categories of situation precisely and move briefly. 


Sets: A set is a well defined collection of distinct objects. The objects making up a 
set are called its elements. A set is usually denoted by a capital letter ice., 
A, B, C etc. while its elements are denoted by small letters i.e., a, b, c etc. For 
example, the set A that consists of first five positive integers can be described as: 


A={1,2,3,4,5} 


Here, for 3 belongs to set A, we write 3 € A, and read it as 3 -belongs to A, 
while for 6 does not belong to set A, we write 6¢ A and read it as 6 does not belong 
to A. 


Null Set: A set that contains no element is called null set or empty set. It is denoted 
by { }or®@. 


po LE Ag ESS Se a 


Subset: - Ifevery element ofa set A is also an element ofa set B, A is said to be a subset 
of B and it is denoted by; ACB 





Proper Subset: _If A is asubset of B, and B contains at-least one element which is not 
an.element of A, A is said to bea proper subset of B and is denoted by; 


AcB 
Finite and Infinite Sets: A set is finite, if it contains a specific number of elements, 
L€., while counting the members of the set, the counting process comes to an end 
otherwise the set is an infinite set. For example; A={L2,3,5}, B={x,y, z,t,u } 


and C = { x| x is month of years } are finite sets. 
Whereas D = { 2,4,6,8,...} and E = {y| yis a point on a line } are infinite sets. 


Universal Set: A set consisting of all the elements of the sets under consideration is 
called the universal set. It is denoted by U. 


For example, if A= {1,2,3}, B ={2,4,5,6} C ={8, 10} then U= {1,2,3,4,5,6,7,8,9,10} 


Jo 


Disjoint Set: Two sets A and Bare said 
to be disjoint sets, if they have no 


of 
elements in common i.e., if AN B=®,A 
and B are disjoint sets. 


A={6,8,10,12} and B={1,4,9,11} 


are disjoint sets as AM B=@ 
ANB=@® 


Overlapping Sets: Two sets A and B are said to be overlapping sets, if they have 
at-least one element in common, ie., if AB #® and none of them is the subset of 


the other set then A and B are overlapping sets. 


For example A={I,3,5,7} and B={1,4,8} are overlapping sets, as 
ANB={l}#@ and none of A and B is the subset of the other. 


—_—_—_—__—____——= Chapter6 Probability 


Venn Diagram: 

Venn diagram is a diagram in which universal set U is represented by a 
rectangle and its subset is represented by a circle. In other words Venn diagram 
represents the relationship between sets by means of diagram. 


Union of Sets: Union of two sets A and B is a 
set that contains the elements either belonging 
to A or to B or to both. It is denoted by 
AU Band read as A union B. For example, if 
A={1,2,3,4,5 band B={2,4,6,8,10 } 

then. AU B={I,2,3,4,5,6,8,10 } 





A U Bis shaded area 


Let: A=({2, 4,6} i 
B= (1, 3, 5} | 
then AU B= {I, 2, 3, 4,5, 6} 


A U Bis shaded area 
Intersection of Sets: Intersection of two sets A and B is a set that contains the elements 


belonging to both A and B. It is denoted by U 


AMB and read as A intersection B. For 
example, if : 

A={I1,2,3,4,5,6 }and B={2,3,6,7 } 

then AMB={ 2,3,6} 


A - Bis shaded area 
Difference of Sets: The difference of a set A and a set B is the set that contains the 
elements of the set A which are not contained 
in B. The difference of sets A and B is denoted 
by A-—B. For example, 
if A={1,2,3,4,5,6,7,8 }and B={2,4,6,8} 


then A-—B={1,3,5,7} 





A —B is shaded area | 


Complement of a Set: Complement of a set A 
denoted by Aor A‘, is defined as A=U-A. 
For example if 

U ={1,2,3,4,5,6,7,8,9,10 } 

and A={2,4,6,8 } 

then A=U—A={I,3,5,7,9,10 } A is shaded area 





Example 6.1: 


If S={1,2,3,4,5,6,7,8,9,10}, A={1,2,3,4} B={2,4,6,8} C ={1,3,5,7 } 
and D = {2,4 }, then find : 


) AUB -i)}AUG i) ANC ivy)CNB vwC vi) A 
Solution: S={1,2,3,4,5,6,7,8,9,10} A={1, 2,3, 4 } 


B={2,4,6,8 } C={1,3,5,7 } and D={2,4} 


i) AUB={1,2,3,4; 6, 8} ii) AUC={1,2)3,4,5,7 } 
iii) ANC={I1,3} iv) CAB=® 
v) C={2,4,6,8,9,10 } vi) A={5.6,7,8,9,10 } 


Factorial: If n is integer, the product of first n positive integers denoted by n! is 
called factorial of n. 

ni=n(n—1)(n—2) .... 3.2.1 

This can also be written as : 

ni=n(n —1)!=n(n—1)(n—2)! 
For example, 

2=2x1=2 

4!=4x3x2xl=24 

10!=10x9x8x7x6x5x4x3x2x1=3628800 


6.2 Permutations 


An arrangement of finite number of objects in a definite order is called 
permutation of these objects. The number of ways of arranging n objects taken r at a 
time is denoted by "P, and is defined as: 


ee Bic: Probability 


eh as n! 
(n—r)! 


r 





The number of permutation of n objects out of which n, are alike of one kind, 


n,are alike of second kind and so on, n are alike of kth kind is given by 


n 
Mi oMaar+My 


n! 


Dons ope 
n,!nm,!M!..2 


n,! 


Example 6.2: How many distinct four-digit numbers can be formed from the 


following integers 1, 2, 3, 4, 5, 6 if each integer is used only once? 





Solution: -- "P= és 
(n—r)! 
Here, n = 6 and r=4 
raha sade 6h six: cr tens 
eS Ee ante gy ae 
Example 6.3: Evaluate: iy Ph 
Solution: 
Dic pee 5! 
i (5 —3)! 
5! 
Side 
5x4x3x2xl 
ue 2x1 
= 60 


6.3 Combinations 





6x5x4x3x2x1 
J = 360 
2x1 
ii) P 
ii) . wp, = 10! 
(10-6)! 
_ lo! 
cme 
_ 10x9x8x7x6x5x4x3x2x1 
BAX SOL 

= 151200 


When a selection of objects is made without paying regard to the order of 
selection, it is called combination. The number of combinations of n things taken r at 


a time is denoted by "C, or by (”) and is defined by 


ay eins) BE. 
(") ri(n—r)! 











Example 6.4: Evaluate: (7G Gi)" °C, nt) °C: 
Solution: 
5 ! ! 
i) O& 5! ul 5! e 
315-3)! 3! 2! 
! ! 
mS 6 ie 6! im 6! We 
2!(6-—2)! 2! 4! 
! ' 
ii) 8 8! 8! P 


CC, === 
S\(8—5)! 5S! 3! 
Random Experiment: Random experiment is an experiment which produces 
different outcomes even if it is repeated a large number of times under similar 
conditions. A random experiment has the following properties: 
i) The experiment can be repeated any number of times. 
ii) A random trial consists of at least two possible outcomes. 


Sample Space: A set representing all possible outcomes of a random experiment is 
called sample space. It is denoted by S. Each element in a sample space is called 
sample point. For example, when a coin is tossed, the sample space is given by 
Steatira) | 
If a coin is tossed two times, the sample space is given by 
S = {HH, HT, TH,TT } 
In throwing a die, the sample space is given by 
S 9 AL23:45,6+ 
When two dice are thrown, the sample space is given by 
s = (G,0,0.2), 4,3), (4), (5), 0.6), 
(2,1), (2,2), (2,3), (2,4), (2,5), (2,6), 
(3,1), (3,2), (3,3), (3,4), (3,5), (3,6), - 
(4,1), (4,2), (4,3), (4,4), (4,5), (4,6), 
(5,1), 5,2), G,3), (5,4), (5,5), (5,6), 
(6,1), (6,2), (6,3), (6,4), (6,5), (6,6); 


Event: Any subset of the sample space is called an event. In a sample space there 
can be two or more events consisting of sample points. For example, when a fair dice 
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is rolled, the coming up of an even number upward is an event i.e., {2, 4, 6} is an 
event. Similarly, the coming up of odd numbers is an event 1.e., {1, 3, 5} is an event. 


Simple Event: If an event consists of one sample points, it is called simple event. 
For example, when two coins are tossed, the event {TT} that two tails appear is a 
simple event. 


Compound Event: If an event consists of more than one sample points, it is called a 
compound event. For example, when two dice are rolled, an event B, the sum of two 
faces is 4i.e., B= {(1, 3), (2, 2), (3, 1)} is a compound event. 


Independent Events: Two events A and B are said to be independent, if the 
occurrence of one does not affect the occurrence of the other. For example, in tossing 
two coins, the occurrence of a head on one coin does not affect in any way the 
occurrence of a head or tail on the other coin. 


Dependent Events: Two events'A and B are said to be dependent, if the occurrence 
‘of one event affects the occurrence of the other event. 


Mutually Exclusive Events: Two events A and B are said to be mutually exclusive, 
if they cannot occur together i.e., AM B=® 

In other words, the two events are called mutually exclusive events, if they are 
disjoint. For example, in toss of a coin, either the head or the tail will appear, but they 
cannot appear together. The appearance of head and the appearance of tail are 
mutually exclusive. | 


Equally Likely Events: Two events are said to be equally likely, if they have the 
same chance of occurrence. For example, when a coin is tossed, it is just as likely to 
occur heads as to occur tails. 


Exhaustive Events: When a sample space S is partitioned into some mutually 
exclusive events, such that their union is the sample space itself, the event are called 
exhaustive event. Let a die is rolled, the sample space is given by 

S = {1,2,3,4,5, 6} , 

Let A={6 2}, B= O43, C= 0 


A, B and C are mutually exclusive events and their union AUBUC=S is_ 
the sample space, so the events A, B and C are exhaustive. 


6.4 Probability 


Classical or “A prior” definition: If there are n equally likely, mutually exclusive 
and exhaustive outcomes and m of which are favourable to the occurrence of an event 
A then the probability of the occurrence of the event A, denoted by P(A) is defined by 
the ratio m/n i.e., 


no.of favourable outcomes m 
LOA) rr ee as ce 
no.of possible outcomes ~— n 


Relative frequency or a posteriori definition: If an experiment is repeated a large 
number of times say n under uniform conditions and if the event A occurs m times, 
then the probability of the occurrence of the event A is defined by the relative 
frequency m/n which approaches a limits as n increases i.e., 


. . m 
P(A)= ne 
Mathematical Definition: The probability that an event A will occur, is the ratio of 
the number of sample points in A to the total number of sample points in S. i.e., 


no.of sample pointsin A n(A) 


P(A) = ia ee 
no. of sample points in S$ n(S) 


P(A) must satisfy the following axioms: 


i) P(A)20, which means that, probability of an event cannot be 


negative. 


ii) O<P(A)S1 ie., Probability of an event lies between 0 and | 


iil) P(S)=1, which means that, sum of the probabilities is equal to one. 
iv) If A and B are two mutually exclusive events, then 
P(AUB) = P(A)+P(B) 
Examples 6.5: A student solved 128 questions from first 200 questions of a book to 
be solved. What is the probability that he will solve the remaining all questions? 
Solution: n = 200, ;m’ = 128, m=n-m' = 200-128=72 


a nanien Bi Probability 


Example 6.6: A bag contains 4 red and 6 green balls out of which 3 balls are drawn: 
Find the probability of drawing 


i) 2 red and 1 green balls. ii) _ all red balls. 
iii) one green ball. iv) no red ball. 
Solution: 


Balls to be drawn = 3 
10 
Sample space = 4 = 120 
i) Let A be the event of drawing 2 red and one green ball 


(0 


nA) _ 36 
n(S) 120 





P(A)= = 0.30 


ii) Let B be the event of drawing all red balls. 


<o-()(0) 


ACB) yi) 
nS) 120 
ili) Let C be the event of drawing one green ball 


<o-() 


POL hae 
nS) 420 


iv) Let D be the event of drawing no red ball 


cmI6} 





P(B)= = 0.033 











a 
nS), 120 


Example 6.7: If two fair dice are thrown, what is the probability of getting 


Mpg ya en ea 


i) a double six. ii) a sum of 8 or more dots. 


Solution: Sample space is given by 


S=  {,0),(,2),(,3),€,4), (1,5), (1,6), 
(2,1), (2,2),(2,3), (2,4), (2,5), (2,6), 
(3,1), 3,2), (3,3), 3,4), (3,5), (3,6), 
(4,1),(4,2),(4,3),(4,4), (4,5), (4,6), 
(5,1), (5, 2),(5,3), (5,4), (5,5), (5,6), 
(6,1), (6,2), (6,3), (6,4), (6,5), (6,6) } 





=> (' nis y= 36 
i) Let A be the event that a double six occurs 
A = {66}>n(A) = 1 
pay = ee 
n(S) 36 
ii) Let B be the event that a sum 8 or more dots occurs. 


{(2, 6), (3,5), (4,4), 5,3), (6, 2), (3,6) (4,5) (4, 6) 





We (5,4), (5,5), (5, 6), (6,3), (6,4), (6,5), (6,6) } 
=> n(B) =15 
. P(py= 78) 2 
si n(S) 36 12 


Example 6.8: Six white balls and four black balls which are indistinguishable apart 
from colour, are placed in a bag. If six balls are taken from the bag, find the 
probability that their being three white and three black. 


Solution: 
white | “Biaak [Tova 
a, Gee hal hae. il ae. I") 
=>n(S) So (= 210 


Let A be the event that three white and ihies black balls are taken- 


CMa ceva Lay GATT: 1 Probability 
Ba i all 
n(A) = (5) (3}- 80 


P(A) = n(A) _ 80 _ 8 
n(S$) 210 21 





Example 6.9: A fair die is tossed. Find the probability that the number on the 


uppermost face is not six. 


Solution: Sample space is given by 


S -= {12,3,4,5,6 b> 'n(S) ="6 


Let A be the event that the uppermost face is 6 and A be the event that face is 
not 6. Then 
A .= {6} > nA) = 1 


PUA) te clad PCA POMPE 


nS) 6 





ade 
6 


an|— 


Theorem not Mutually Exclusive Events 

Statement: If A and B are two not mutually exclusive events, the probability atleast 

one of them happens is the probability that event A occurs plus the probability that 

event B occurs minus the probability that both A and B occur simultaneously i.e., 
P(AUB) = P(A)+P(B)-— P(ANB) 

Proof: The event A or B can be expressed as the union of two mutually exclusive 

events A and B—-AMB, then 


AUB=AU(B-ANB) 
By taking probability on both the 
sides we have, 
P(AUB) = P[AU(B-ANB)] 





= P(A)+P(B-AQB) vere (i) 
We can express B as a union of two A-ANB ANB B-ANB 
mutually exclusive events AM Band B—(AMB)then 
B = (ANB)U(B-ANB) 


By taking probability on both the sides we have, 
P(B) = P((ANB)U(B-ANB)] 
P(B) =P(AQNB)+P(B-(ANB)) 
P(B) = P[AQB)+P(B-ANB)] 
P(B) — P(ANB)=P(B-AQB) 
By putting the value of P(B— Aj B)in equation (i) we get, 
P(AUB) = P(A)+ P(B)- P(AN B) 


Example 6.10: A coin is tossed twice, points of the sample space are HH, HT, TH, TT 
and each sample point with probability — , 


If A and B are the events that head at first coin and tail on second coin 
respectively. Then find P(A U B). 


Solution: Sample space is given by 
S = {HH,HT,TH,TT}=> n(S) = 4 








A = {HH, HT} =>n(A)=2 
PA) oe 
n(S) 4 

B ={TT, HT} > n(B) =2 
p(By 2) re 
n(S) 4 


AQB ={HT} > n(A7B)=1 
SP e ee nai i ad, RA 
n(S) 4 


and P(AUB) = P(A)+P(B)—P(ANB) 


a: ae awe 
= —+— - — = — 
4 4 4 4 
Example 6.11: Two dice are rolled. If A and B are respectively the events that the 
sum of points is 8 and both dice should give odd numbers, Then find P(AU B). 


SSS eae Chace A Probability 


Solution: Sample space is shown in the example 6.7, where we see 





ACS i386 
A = {(2,6),(3,5),(4,4),(5,3),(6,2) }>n(A) = 5 
BU AN eh eae 
n(S) 36 


“B= { (11),,3), 0,5), 3,0, 3,3), G,5), 5,0, 5,3), (5,5) J n(B) =9 
n(B) _ 9 
n(S) 36 
ANB ={@,5), (5,3) } n( AB) = 2 
PAAR) = oe 
nS) 36 
P(A)+ P(B)—P(AQNB) 


“. P(B) = 





P(AUB) 


Addition Theorem for Mutually Exclusive Events 
If A and B are two mutually exclusive events then the probability either of 


them happening is the sum of their respective probabilities i.e., 
P(AUB) = P(A) +P(B) 


Proof: Let n be the number of sample points in a sample space S. Let A and B be the 
two mutually exclusive events in the sample S, such that event A contains m, sample 
points and event B contains m, sample points. Then AUB will contain the sample 


points belonging to either A or B. 
As A and B are two mutually exclusive events. 
ie., ANB= @ 
The sample points for AUB will be m,+m., 
P(AUB)=— = 4 ™ = P(A) + P(B) 
n ee 


“. PAUB) = P(A) + P(B) 


ee 


Example 6.12: A pair of dice are rolled. Find the probability that the sum of the 
uppermost dots is either 6 or 9. 


Solution: Sample space is shown in the example 6.7, where we see 
nS) = 36 
"Let A be the event that the sum of the uppermost dots is 6, then 
A = {(,5), (2,4), G,3), (4,2), (5,1) }= n(A) =5 


ye ee 
n(S) 36 
Let B be the event that the sum of the uppermost dots is 9, then © 
B= {(3,6), (4,5), (5,4), (6,3) }=> n(B) =4 
n(B) 4 1 


ee ais | SR 


Since A and B are mutually exclusive events. 
5 By 
“. PCAUB) = P(A)+P(B) = — + —=— 
( ) = P(A)+P(B) Peake a 


Example 6.13: A pair of dice is thrown. Find the probability of getting a total of 
either 5 or 11. 


Solution: Sample space is shown in the example 6.7, where we see 
n(S)= 36 


Let A be the event that a total of 5 occurs, then 
A = {(1,4),(2,3),(3,2),(4,1) }=> n(A) = 4 
n(A) _ 4 1 


.. P(A)= =—= 
a n(S) 36 (9 





Let B be the event that a total of 11 occurs. 
B= {(5,6),(6,5)}=> n(B)= 2 


. P(B) = PD ule. ik Mp, 
n(S) 36 18 


ee A is Probability 


As events A and B are mutually exclusive, then 


1 1 


P(AUB) = P(A)+P(B) =T + wal 


18 6 


Example 6.14: Three horses A, B and C are ina race. A is twice as likely to win as B 


and B is twice as likely to win as C, then 


i) 
ii) 


Solution: 


i) 


ii) 


What are their respective chance of winning. 
What is the probability that B or C wins 
Let 7 Ey = 


P(B) =2P(C) 
P(A) =2P(B) 


2P 
2(2P) = 4P 


Since A, B and C are mutually exclusive and collectively exhaustive 
events. Therefore, the probabilities must be equal to one i.e., 
P(A)+P(B)+ P(C) =1 
=>4P+2P+P =1 


5 ao | 
Boh jell ke 
7 


- P(Ay=, P(B)==. = 
: P(A)=—., P(B)=—and P(C) =~ 


As B and C are mutually exclusive events, so 
Birt 3 

P(BUC) = P(B)+P(C) =—+— =— 
( ) (B)+ P(C) aa = 


6.4.1 Conditional Probability 


If two events A and B are defined on a sample space S and if 


probability of B is not equal to zero, then the conditional probability of an 


event A given that B has occurred is written as P(A/B) and is defined as: 


P(A/B) = ~—— where P(B) >0 


If P(B)=0, the conditional probability is undefined 


a 


If A and B are two independent events, then the probability that A and B 
happen is the product of their respective probabilities, i.e., 


Multiplication Theorem For Independent Events 


P(ANB) = P(A)P(B) 
Proof: Let event A has n possible outcomes and m favourable outcomes and event B 


has N possible outcomes and M favourable outcomes then, 
M 
P(A) = —~and P(B) = —— 
n N 


As A and B are independent events so there will be nN possible outcomes for 
the joint events A and B. As each of n possible outcomes for A are associated with N 
possible outcomes for B. Similarly, each favourable outcomes for A is associated with 
each favourable outcomes for B. To have the favourable outcomes for compound 


event AO B. Then the total favourable outcomes for A and B are mM. 





A B 
Paria 
te |e 
or P(AMB) = P(A) P(B) 
n N 


Multiplication Theorem for not Independent Events 


If A and B are two not mutually independent events, then the probability that 
both A and B happens is the probability of event A multiplied by the conditional 
probability of B given that event A has already occurred or is the probability of event 
B multiplied by the conditional probability of event A given that event B has already 
occurred i.e., 


P(AMNB) = P(A)P(B/A) or P(ANB) = P(B)P(A/B)\ 


Proof: Let us have n sample points in a sample space S$ and A and B are not 
independent events such that event A has m;, event B has m> and AQB has m; 


sample points. 


Le Oa A rs Probability 


Then, P(A)=—“" and P(B)=—2 
n n 


P(AQB) =—2 
n 


mete ae) eS 
Now, P(A G B)=—— x = x 
peer Nate «anne 


Here, P(A) = —“» and “3. = P(B/A) 
n 1 
P(AQB) = P(B/A) P(A) 
Ms ag Ma Ma Mn 


or P(ANB) =—2 = 
n pe eae n 


mM, 





Here, = P(B)and — 2 = P(A/B) 
m 


2 


P(AMB) = P(A/B) P(B) 


Example 6.15: Two a’s and two b’s are arranged in order, all arrangements are 
equally likely given that the last letters in order is b. Find the probability that 2 a’s 


are together. 


Solution: S = { aabb, abba, bbaa, baab, baba, abab} =>n(S)=6 
Let A be the event that the two a’s are together 





A = { aabb, bbaa, baab } = n(A) =3 
P(A) = mA = Be A 
nsy BoD 


Let B be the event that the last letter is b 
B= {aabb, abab,baab} => n(B) =3 





epiey ey  gae 
n(S) 6 
ANB ={aabb, baab } =n(ANB) = 2 
Poa ce ee 
n(S) o3 


Fy ee 
PB) 23 


cS 


Example 6.16 In tossing two coins find: 
i) The probability of two heads given that a head on the first coin. 


ii) The probability of two heads given that atleast one head. 


Solution: Sample space is 


So= {HH/HT, TH, TT} :n(S) = 
Let A be the event that the head appears on the first coin. 
A = {HH,HT}=> n(A) = 
n(A) 2 1 


“ P(A) = =— => 
= n(S) 4 2 





Let B be the event that two heads appear 


= {HH} = n(B) =1 


:. P(B) = PH) ashe 
n(S) 4 
AQB={HH} =>n(AQNB)= 1 
1 PAN B)= aaa 5 
n(S) 4 
P(/)- P(AQB) _1 
Ahi =PUA) 


ii) Let C be the event that atleast one head appears 


C = {HH,HT,TH } = fit fi ee 





P(C) = MC). 3 
nS) 4 

B = {HH} => n(B) = 
Pipy = St 
n(S) 4 


-P(b/A)= PAB) V4 1 
POS G8, 2 


a OR nier Bo Probability 


BOC ={HH} = n(BnC)=1 


P(BOC)= oe 
pRic~ Lee 2 Yee | 


P(C) 3/4 3 
Example 6.17: A and B are two independent events. If P(A) = 0.40, P(B) = 0.30. 
Find the probabilities 


i) P(AQB) ii) P(A/B) iii) P(B/A) 
iv) P(AUB) vy) P(AMB) iv) P(A/B) 
Solution: We have 
P(A) =0.40,  P(B)=0.30 


Since A and B are independent events, therefore, 


i) P(ANB) = P(A) P(B) = (0.40) (0.30) = 0.12 
MATE YP Welbog 


= = 0.40 
P(B) 0.30 
iii) P(B/ Ay Oe et desig — =03 


iv) P(AUB) = P(A) + P(B) — P(ANB) 
= 0.40+0.30-—0.12=0.70—0.12 
=0.58 


vy) P(AQB)-= P(AUB) 
= |—P(AUB) = 1-0.58 =0.42 
jena P(ANB) _ P(ADB) 
P(B) 1-P(B) 
‘PGIB) = 0.42 | Orage 


1—0.30 Se igs 








Example 6.18: Three missiles are fired at a target. If the probabilities of hitting the 


target are 0.4, 0.5 and 0.6 respectively and if the missiles are fired independently what 
/ 
is the probability that at least two missiles hit the target? 





Solution: 
P(A) = 040 = P(A) = 0.6 
P(B) 0.5 = P(B) 0.5 
P(C) = 06 => P(C) = 04 


Let D be the event that at least two shots hit the targets. 


P(D) = PPANBOC) + PPAABAC) + P(ANBOC) + P(ANBOC) 
= P(A) P(B) P(C) + P(A) P(B) P(C) + P(A) P(B) P(C) + P(A) P(B) P(C) 
= 0.4x0.5x0.4 + 0.4x0.5x0.6 + 0.6x0.5 x 0.6 + 0.4 x 0.5x0.6 
= 0.08+0.12 + 0.18+ 0.12 | 
or P(D)= 0.50 


Example 6.19: A purse contains 2 silver,4 copper and a second purse contains 4 


silver, 3 copper coins. If a coin is selected at random from one of the purses. What is 
the probability that it is a 


i) Silver coin il) Copper coin. 


Solution: 


Purse I Purse Il 


Silver coin 
Copper coin 
Total coin 


P (Purse I) 





i) Let A be the event that the selected coin is silver 


P(A)= P(Purse I) e{ Gites CON got |*P (Purse I)P [sae coin 


i eer’: 1 4 Mae a. SO 
=| —X— |+|] —x— |= —4+——=—— =—— 
fs aa U2 G4 184 aD 


ii) Let B be the event that the coin selected is a copper, then 


Purse u] 





ee ates A Probability 


P(B) = P(Purse 1) P iniege pet wine + P(Purse Il) 06a sor 
Le Wat 1 Sa cma 
Sy ORS te a ers eee 
2 Gah A 2. oh ane ae ae 


Exercise 6 Ans on Page 255 


6.1. i) Define permutation and combination and discriminate between these 
two? 


ii) Let A= {15 3) 57) ie {2, 4, 6,3},.C=—{1,2,3,4,5} 
and S =:{1,:2, 3,'4, 5, 6,-7,'8} 
list the elements of the following:- 


a) ANB b) CUA c) ANC d) BNC 


6.2 Evaluate the following 


i) 7 ii) 16!/8! iti) 17!/8!.4! 
iv) ’ DP, vy" P. vi) °P, 
vii) °c, viii) *c,; me, 


6.3 Let A = {1,4}, B= {2, 3}, C= {3} be the subsets of the universal set 

S ={1, 2, 3, 4}, find 

i) AXB ii) BXA ili) AXA iv) BxB 
6.4. What do you understand by 

i) sample space ii)event iii) sample point iv) simple and compound event? 
6.5 Define mutually exclusive, independent and dependent events. 


6.6 Explain with examples the terms; random experiment, sample space and an 


event. 


6.7. State and prove the multiplicative law of probability for two events A and B 
that are not statistically independent. 


6.8 


6.9 
6.10 


6.11 


6.12 
6.13 


6.14 


6.15 


6.16 


Bea 


ii) A 5 appears in rolling a six faced cubical dice. 





Find the probability of each of the following: 


i) A head appears in tossing a fair coin. 


ili) An even number appears when a perfect cubical die is rolled. 


What is the probability of selecting a card of diamonds from a pack of playing 
cards consisting of the usual 52 cards. 


Show that in a single throw with two die, the chance of throwing more than 7 
is equal to that of throwing less than 7. 


A bag contains 12 balls of which 3 are marked, if 5 balls are drawn out 
together. What is the probability that 3 of the marked balls are among 
these 5? 


What is the probability of throwing either 7 or more than 10 with two dice? 


A bag containing 2 red, 3 green, 5 blue and 2 yellow balls. Find the 
probability that balls of all colours, are represented in a sample if four balls 
are selected at random. 


A bag contains 5 white and 7 black balls. If 3 balls are drawn from the bag, 
what is the probability that; 


i) All are white. ii) Two are white and one is black. 
ili) All are of the same colour? 

Determine the probability for the following events: 

i) the sum 8 appears in a single toss of a pair of fair dice. 

ii) A sum 7 or 11 comes up in a single toss of a pair of fair dice. 


ili) A ball drawn at random from a bag containing 5 red, 6 white, 4 blue 
and 3 orange balls is either red or blue. 


Two cards are drawn at random from a well shuffled pack of 52 cards. Find 
the probability that: 


i) One is king and other is queen. ii) Both are of same colour. 


iii) Both are of different colours. 


6.17 


6.18 


6.19 


6.20 


6.21 


6.22 


6.23 


6.24 


6.25 


6.26 
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A bag contains 9 white and 12 black balls. Find the probability of drawing 5 
black balls out of the bag containing 21 balls. 


From a bag containing 5 white, and 3 black balls, 2 are drawn at random. Find 
the chance that both are of the same colour. 


A set of eight cards contains one joker. A and B, are two players, choose 5 
cards at random, B takes the remaining 3 cards, what is the probability that A 
has a joker? 


From a pack of 52 cards, two cards are drawn. What is the probability that one 
is king and the other is queen. 


In a poker hand consisting of 5 cards. What is the probability of holding? 

i) 2 aces and 2 kings. 

ii) 5 spades. 

A bag containing 14 identical balls, out of which 4 are red, 5 black and 5 
white balls, if three balls are drawn from the bag. Find the chance that 


i) 3 are red. ii) At least two are white. 


A marble is drawn at random from a box containing 10 red, 30 white, 20 blue 


and 15 orange marbles. Find the probability that it is: 


i) orange or red. ii) not red or orange. 


il) not blue. iv) red, white or blue. 


What is meant by conditional probability? 


State and prove multiplication laws of probabilities for independent and 


dependent events. 


A and B can solve 60% and 80% of the problems in a book respectively. 
What is the probability that either A or B can solve a problem chosen at 


random? 


6.27 


6.28 


6.29 


6.30 


6.31 


6.32 


6.33 


 _ ae 


A class contains 10 men and 20 women out of which half men and half 
women have brown eyes. Find the probability that a person chosen at random 
is a man or has brown eyes. 


A box contains 9 tickets numbered 1 to 9. If 3 tickets are drawn from the box 
one at a time, find the probability that they are alternately either odd even odd 
or even odd even. 


Two drawings each of 3 balls are made from a bag containing 5 white and 8 


_ black balls. The balls are not being replaced before the next trial. What is the 


probability that the first drawing will give 3 white balls and the second 
drawing will give three black balls. . 


Three balls are drawn successively from a box containing 6 red balls, 4 white 
balls and 5 blue balls. Find the probability that they are drawn in the order 
red, white and blue if each ball is 


i) replaced ii) — not replaced. 


One bag contains 4 white and 2 black balls another bag contains 3 white and 5 
black balls. If one ball is drawn from each bag, find the probability that. 


i) both are white ii) both are black. 


Two urns contain respectively 3 white and 7 red balls 15 black and 10 white 
balls, 6 red and 9 black balls. One ball is taken from each urn. What is the 
probability that both will be of the same colour? 


The probability that a man will alive in 25 years is 3/5 and that his wife will 
be alive in 25 years is 2/3, find the probability that. 


i) : Both will alive in 25 years. 

ii) Only the man will alive in 25 years. 
ili) Only the wife will alive in 25 years. . 
iv) Atleast one will alive in 25 years. 

v) None of them will alive in 25 years. 


V1) At the most one will alive in 25 years. 


6.34 


6.35 


6.36 


6.37 


6.38 


6.39 
6.40 


6.41 
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Three cards are drawn at random from an ordinary pack of 52 cards. Find the 
probability that they will consists of a Jack, a queen and a king. 


Two cards are drawn from a well shuffled pack of 52 cards. Find the 
probability that they are both aces if the first cards drawn is: 


i) replaced. il) not replaced. 


Urn A contains 5 red balls and 3 white balls and urn B contains 2 red balls and 
6 white balls. 


i) If a ball is drawn from each urn what is the probability that both are of 
the same colour? 


ii) If two balls are drawn from each urn what is the probability that all 4 
balls are of the same colour? 


Three Ghori missiles are fired at a target. If the probabilities of hitting the 
target are 0.4, 0.5 and 0.6 respectively and the missiles are fired 
independently, what is the probability that atleast 2 missiles hit the target? 


Assume that X is a number chosen at random from the set of integers between 
1 and 14 respectively. What is the probability that 


i) X is a single digit number. ii) X isa multiple of 5 or 6. 
What are the odds for the occurrence of an even if its probability is 4/7? 


Suppose that it is 9 to 7 against a person A who is now 35 years of age living 
till he is 65 and 3 to 2 against a person B now 45 years of age living till he is 
75. Find the probability that one of these persons will be alive 30 years hence. 


A purse contains 2 silver and 4 copper and second purse contains 4 silver and 
3 copper coins. If a coin is selected at random from one of the purses. What 
is the probability’that it is a 


i) Silver coin. ii) Copper coin. 


One purse contains | sovereign and 3 shillings, a second purse contains 
2 sovereign and 4 shillings and third purse contains 3 sovereign and 1 shilling. 
If the coin taken out of the purse is selected at random. Find the chance that it is _ 


sovereign. 


6.43 


6.44 
6.45 


6.46 


6.47 


6.48 


6.49 


6.50 


6.51 


6.52 


6.53 
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The probability that a student will get a grade of A, B or C in statistics course 
are 0.09, 0.15 and 0.53 respectively, what is the probability that the student 
will get a grade lower than C. 

3 coins are tossed, what is the probability of getting atleast one head? 

A and B play 12 games of chess out of which 6 are won by A and 4 by B and 
two games end in a tie. They agree to play a tournament consisting of 3 
games. Find the probability that 

i) A wins all the 3 games. ii) Two games end in tie. 

iii) A and B won alternately. iv) B wins atleast | game. 

For two independent events A and B, P(A) = 0.25 and P(B) = 0.40. Find 
P(AQB)? 

For 2 rolls of a balanced die, find the probability of getting 1st a five and then 
a number less than 4. 

If two cards are drawn from an ordinary deck of 52 cards. What is the 
probability that both will be diamonds, if the drawing is without replacement? 
A, B and C take turns in throwing a die for a prize to be given to one who first 
obtains 6. Compare their chances of success. 

First bag contains 4 white balls and 3 black balls and second bag contains 3 
white and 5 black balls. One ball is drawn from the first bag and placed 
unseen in the second bag. What is the probability that a ball now drawn from 
the second bag is black? 

From a bag containing 4 white and 5 black balls, 2 balls are drawn at random. 
Find the probability that they are of same colour. 

3 groups of children contains respectively 3 girls and 1 boy, 2 girls and 
2 boys and 1 girl and 3 boys. One child is selected at random from 
each group. Find the chance that the selected group comprises of one girl and 
two boys. 

A can hit a target 4 times in 5 shots, B can hit 2 times in 5 shots and C can hit 
2 times in 4 shots. Find the probability that 


i) 2 shots hit. ii) at least two shots hit. 
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6.54 Fill in the blanks: . 


6.55 


i) 
ii) 
ili) 
iv) 
v) 
vi) 
Vii) 
Viil) 


ix) 


x) 


A set is any | collection of things. 

A set containing only one element is called set. 
A set of subsets of set is called of the set. 

A event contains more than one element. 

Two events are if they have no common point. 
If AUB=S thenA and B are events. 

Ifn(A) = n(B), then A and B are events. 


The number of subsets of a set containing n points are 


The orderly arrangements of r distinct things out of n are called 
and denoted by 


A non-orderly arrangement of things is called 


Against each statement, write T for true and F for false statement. 


i) 
ii) 
iii) 
iv) 
v) 
vi) 
vii) 
viii) 
ix) 


x) 


The null set is also named as the impossible event. 

A part of a set is called improper subset of the set. 

A subset of sample space is called sure event. 

An event consisting of only one sample point is called compound event. 
If a coin is tossed four times, the number of sample points will be 22. 

If A and B are mutually exclusive events then, P(A U B) = P(A) + P(B). 
The probability of drawing a red card from 52 cards is 13/15. 

The complementary events are always mutually exclusive events. 

When a coin is tossed, the sample space is {HH, HT}. 


If A and B are independent, then P(AU B) = P(A) + P(B). 
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O<P(A)=! 
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fo™ 
/ @/ 
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7.1. Introduction 


Every random experiment results in two or more outcomes and usually the 
interest is in a particular aspect of the outcomes of the experiment. For example, 
when a pair of dice is thrown, the interest may be in the total of upturned dots on both 
dice. In case of this experiment, the total may be 2 (one on each die), 3 (one on the 
first die and two on the other) and so on. It may be 4, 5, 6, 7, 8, 9, 10, 11 or 12. In the 
language of probability, these values associated with outcomes are the values of a 
so-called random variable. An other example is the total number of children in each 
of the fifty randomly chosen families. If no family has more than 5 children then the 
values of this random variable i.e., number of children in each family, would be 
0,1,2,3,4,5 i.e., no child, one child, two children, 3 children, 4 children and five 
children respectively. 


A variable whose values depend upon the outcomes of a random experiment is_ 
called a random variable. We will denote the random variable by the capital letters 
X, Y or Z and their values by the corresponding small letters x, y or z. 


Example 7.1 : Let a pair of dice be thrown and Y denote the random variable that is 
the sum of upturned values on the two dice. There are 36 outcomes and Y assigns to 
the outcome (1, 1) the real number | + 1 = 2. It assigns to the outcome (2, 1), the real 
number 2 + 1=3 and so on uptil the outcome (6, 6), the real number 6 + 6 = 12 so, the 
values assigned are 2,3,4,5,6,7,8,9,10,11 and 12. 


7.2 Random numbers and their generation 


- Random numbers are a sequence of digits from the set {0, 1, 2, ....,9}. So that, at 
each position in the sequence, each digit has the same probability 0.1 of being 


selected irrespective of the actual sequence, so far constructed. The probability is 0.1 
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because out of ten digits {0, 1, 2, ...., 9} each digit has equal probability i.e., 1/10 or 
0.1. These are also known as random digits. 


The simplest ways of achieving such numbers are games of chance such as 
dice, coins, cards or by repeatedly drawing numbered slips out of a hat. These are 
usually grouped purely for convenience of reading but this would become very 
tedious for long runs of each digits. Fortunately tables of random digits are no w 
widely available (see table 7.1). 


For implementation on computers to provide sequences of such digits easily 
and quickly, the most common methods are called Pseudo random techniques. 
Here, digits will re-appear in the same order (i.e., cycle) eventually but for a good 
technique the cycle might be tens of thousands of digits long. Of course the Pseudo 
random digits as the title says, are not truly random. In fact, they are completely 
deterministic but they do exhibit most of the properties of random digits. 


Generally, these methods involve the recursive formula as 


_ X,., = ax, +b(mod m); n = 0,1,2,3, ... yb (7.1) 


n+l 


Here a, b and m are suitably chosen integer constants and the seed x, (a 


starting number) is an integer. By mod m we means that if the answer is greater than 
m, then divide it by m and keep the remainder as a random number. Use of this 
formula gives rise to a sequence of integers each of which is in the range 0 to m-1. 
We simply run these together to give our sequence of pseudo random digits. Clearly 
this to be of any value; m, a and / or b should be large. 


Example 7.2: Let a = 13, b=0 and m= 16. Generate 4 random numbers. 


Solution: According to the relation 


X,,,=ax,+b (mod 16) forn=0, 1, 2, ... 


Let aseed x, be 5, then for n = 0, we have 


188 


x, = 13x,+b (mod16) 
= 13(5)+0 (mod 16) 
= 65 (mod 16) 
= 1(dividing 65 by 16, the remainder is 1) 


For n = 1, we have, 


= 13(1)+ 0 sig 16) 
= 13 


For n = 2, we have, 


= 13(13)+0 (mod16) 
an 


Similarly, for n = 3, we have, 


x, = 139)+0 (mod 16) 
Be 


So, the random numbers are 1, 13, 9, 5. 


-7.3. Application of random numbers 

The random numbers have widely applicability in the simulation techniques 
(also called Monte Carlo Methods) which have been applied to many problems in the 
various sciences and are useful in the situations where direct experimentation is not 


possible, the cost of conducting an experiment is very high or the experiment takes 


too much time. 


The random number tables are constructed from the random numbers where, 
the random numbers are grouped for the purposes of reading. The groups may 
consist of 2-digit, 3-digit, 4-digit or 5-digit sequence of random numbers 
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9). A five digit sequence of these are given in table 7.1. We 
may use this table for 1-digit random numbers, 2-digit random numbers and so on 


uptil 5-digit random numbers depending upon the situation by using it row-wise or 
column-wise. 
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To explain the use of a random number table, consider the following example 
Dads 


Example 7.3: Count the number of heads and tails when a single coin is thrown 


12 times without throwing a coin. 


Solution: The first step is to number both the heads and tails staying within 0 to 9. 
Let 0, 2, 4,°6, 8 (even numbers including zero) indicate heads and the odd numbers 1, 
3, 5, 7, 9 represent tails. 

The second step is to open the random number table given in table 7.1 and 
select arbitrarily a column, say column | and select arbitrarily a row say row 4 and 
start reading a set of 1-digit ie., 4, 8, 5, 7, 7, 2, 5, 0, 1, 5, 9, 4. Third step is to 
interpret them as H, H, T, T, T, T, H, T, H, T, T, T, H as was decided head for 0, 2, 
4, 6, 8 and tail for 1, 3, 5, 7, 9. So, there are 5 heads and 7 tails. These are given in 


the following frequency table. 
5 
iF 


Similarly, to count the number of heads when two coins are thrown 20 times, 
the first step is to take two-digit random numbers. If both are even including zero, 


then both are heads; if both are odd then both are tails (0 head), and if one is even and 






Head (one head) 
Tail (0 head) 





one is odd then there is one head. The second step is to open the random number 
table and select arbitrarily a column say column 2 and select arbitrarily a row say. row 
2 and start reading two digit numbers i.e., 24, 80, 76, 06, 52, 34, 92, 03, 92, 93, 28, 
83, 15, 03, 91, 20, 52, 92, 59, 56. Third step is to interpret them as HH, HH, TH, HH, 
fe ne es Fee ie ee, eae, Ts TH So the 
frequency table would be 
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Another method known as probability proportional to size (PPS) is used when 
we have information about the probabilities of the outcomes. 


Example 7.4: If a coin is thrown thrice or three coins are tossed once) then the 
number of heads may be O (ail tails) 1, 2 and 3 and the corresponding relative 


frequencies would be as given in adjoining table. 


Number of heads 








Find the sequence of the number of heads 








Probability 
20 times without throwing a coin? 

Solution: Here we use 3-digits random 
numbers because probabilities are given 
to three decimals. 









The first step is to make a cumulative probabilities (c.p) column. 


The second step is to assign random numbers (r.no) from 0 to 999 because we 
have 3-digit probabilities. Note that in each class the random number assigned is one 
less than the number formed by the corresponding cumulative frequency. The reason 
is that the random numbers are from 0 to 999 not from | to 1000. These are shown in 
the table 7.2. 

Table 7.2: cumulative probabilities and range. 








000-124 
125-499 
500-874 
875-999 





The third step is to open the random number table 7.1 and select arbitrarily a 
column, say column 4, and select arbitrarily a row, say row 2, the 3-digit random 
_ numbers are 441, 833, 789, 924, 976, 562, 635, 122, 793, 472, 993, 952, 309, 420, 
676, 662, 390, 782, 348, 773 and the corresponding number of heads are 1,2, 2, 3, 3, 
Deak: Dy Un Se ath ghey ap ey Lin ey Daven ee 

441 indicates 1 head because it is in the class corresponding to | head. 

833 indicates 2 heads because it is in the class corresponding to 2 heads. 

789 indicates 2 heads because it is in the class corresponding to.2 heads. 

924 indicates 3 heads because it is in the class corresponding to 3 heads and 
so on. The corresponding frequency table is as given in table 7.3. 


Table 7.3: Frequency Distribution 





7.4 Concept of random variables and their construction 
from different fields. 


As we are familiar by now that random variables arise from the outcomes of 
random experiments by associating a value to each outcome. The following 
examples may help explain them in detail. 


Example 7.5: Consider an experiment in which three students in a class are asked to 
take one of the two courses Biology (B) or Computer Science (C). 


Solution: Define the random variable Y by 
- Y= The number of students taking computer science. 
The possible values are 
No. student takes computer science so, y = 0 
One student takes computer science so, y = | 


Two students take computer science so, y = 2 
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Three students take computer science so, y = 3 


The possible outcomes of the experiment are: 


BBB 
CBB 
‘BCB 
BBC 
CCB 


CBC 


BCC 


CoC 


first takes biology, second takes biology, third takes biology. 

first takes computer science, second takes biology third takes biology. 
first takes biology, second takes computer science, third takes biology. 
first takes biology, second takes biology, third takes computer science. 


first takes computer science, second takes computer science, third 
takes biology. 


first takes computer science, second takes biology, third takes 
computer science. 


first takes biology, second takes computer science, third takes 
computer science. 


first takes. computer science, second takes Faget science, third 
takes computer science. 


These are represented as follows: 


BBB CBB BCB’ BBC “CCB CBC’ BCC CCC 


1 1 1 2 2 = 3 


Note that the values of the random variable Y are isolated points 0, 1, 2 and 3. 


Example 7.6 : Consider an experiment of recording the time (minutes) taken by the 


customers to wait for its turn in a utility store while standing in a queue. 


Define the random variable Y where Y is the time taken by the customers. 


Solution: The first customer may take 5.0 minutes, the second may takes 6.0 eres 


the third may take 5.30 minutes, fourth may takes 12.0 minutes and so on. 


It may be noted that Y may take any value in an interval on a number line. 


7.5 Discrete and continuous random variables 


Discrete random variable 


A random variable is called discrete if the set of values it takes is a collection 


of isolated points on a real line i.e., the sample space S$ is a discrete sample space. 


The outcomes of an experiment are noted and the value of the random variable is a 





number appropriately assigned to each outcome by some rule. 


Example 7.7: Three coins are tossed and let the random variable Y denote the 


number of heads. Then the outcomes and the values of Y are given in adjoining table. 


Value of Y 








The values of Y are whole numbers. So, the sample space is discrete. Thus Y is 


a discrete random variable. 


Example 7.5 above is also another example of a discrete random variable. 


Continuous random variable: A random variable is called continuous if the set of 
values it takes is an entire interval on the number line i.e., the sample space S is 
continuous. The outcomes of an experiment are represented by the points on a line 


and the value of the random variable is a number appropriately assigned to each point 


by some rule. 


Example 7.8: Consider the experiment of measuring the height of students in a 
statistics class. If the minimum height of a student is 5.0 feet and the maximum 


height is 5.8 feet, then the variable Y , the height of students, takes values between 5.0 
and 5.8 feet i.e., in the interval 5.0 — 5.8 feet. 


Example 7.6 is also an example of a continuous random variable. 
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Bxercise 7” Ans on Page 257 


7.1 


7.2 


7.3 


7.4 


TS 


i) Define random variable and give an example to explain it. 


ii) What are random numbers and how these are generated. Also give an 


example to explain their application. 


iii) Classify each of the following random variables as either discrete or 


continuous. 
a) The number of pages in a book. 


b) The number of questions asked in an oral examination. 
c) The life time of a light bulb. 


d) The amount of rainfall at a particular location during different 
months of 1996. 
Generate the first 6 random digits using the pseudo-random number generator 


with 
m=100, a=21, b=7, %=10 


Count the number of heads and tails when a single coin is tossed 10 times 


without throwing a coin. 


Two coins are tossed. Let Y denotes the 
number of heads, then the possible number of 
heads and their corresponding probabilities are 


given in the adjoining table 








Find the sequence of number of heads 20 times, without throwing a coin? 


Let the digits 0, 1, 2, 3, 4 represent head and 5, 6, 7, 8, 9 represent tail, use 


random numbers to simulate 20 flips of a coin. 


7.6 


Tet 


7.8 


7.9 


7.10 


7.11 


——EE 


Two students in a class are asked to take one of the two courses Mathematics 
(M) or Biology (B). Define the random variable Y as number of students 
taking Biology. Write down the possible outcomes and the values assigned by 
the random variable Y. 


Two coins are tossed and let the random variable y denote the number of 
heads. Write down the possible outcomes and the values assigned to the 
random variable. 


There are three children in a family. Let the random variable denote the 
number of boys in a family. Write down the possible outcomes and the values 
assigned by the random variable assuming equal chances for boys and girls. 


Four balls are drawn from a bag containing 5 white and 3 black balls. If X 
denotes the number of white balls drawn, then write down the possible values of 
the random variable X. 


Differentiate between discrete and continuous random variable and give an 
example of each. 


Fill in the blanks: 
i) Random numbers are by some process. 


ii) Random numbers are obtained in such a way that each digit has 
probability. 


iii) Random variable is also called variable. 


iv) A random variable assuming only a finite number of values is 


called random variable. 
v) A random variable assuming all possible values in a is 
called random variable. 


vi) The sum of probabilities of events of a sample space is always 


vii) A probability function is _ fu... tion. 
viii) The mean of probability distribution is called 
ix) E(X)=)Xf(X)ifit absolutely. 


x) A random variable may be or 


; 
eve | 


a 


7 ort = ga — ~~}. 
Probability Distributions, 
O<P(A)=1 3 e ¢| 


\ > oe her 
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8.1 Introduction 


Whenever we talk about random experiments, there is the need to associate a 
numerical value with each of their outcomes, in order to study them. As a result, two 
types of variables arise. 


if Discrete random variable. ii. Continuous random variable. 


A discrete random variable almost always arises in connection with counting 
and a continuous random variable is one whose values are typically obtained by 
measurements. 


In case of a discrete random variable, its probability distribution describes 
how much of the probability is placed on each of its possible values with the total of 
all these probabilities equal to 1. The probability distribution of a discrete random 
variable is usually called its probability mass function. 


In case of a continuous random variable, we cannot talk about the probability 
on a point instead we talk about the probability on any interval of the values the 
random variable takes, with the total of the probabilities equal to 1. The probability 
distribution of a continuous random variable is called its probability density function. 


The probability distribution of a discrete random variable is usually written 
with the help of a function, called its formula or it can be described with the help of a two 
column table (like frequency distribution) where one column gives the values 
(intervals of values in case of continuous random variables) and the other column 
gives the probabilities. 


8.2 Probability Mass Function 


As the value of a discrete random variable is determined by the outcome of arapdom 
¢\periment, one can associate with each possible value of the discrete random 


variable a probability that a random variable will take on that valuc.The probability mass 
function of a discrete random variable Y describes the values of Y and the probability 
associated with each value of Y. Usually, it is written in a two column table where 
one column gives the values of the random variable Y and the other column gives the 
probabilities associated with each value. 
Example 8.1: A coin is tossed three times and let the random variable Y denote the 
number of upturned heads. Find the probability mass function? 
Table A 

Solution: There are 8 possible outcomes. The | Outcomes 

possible outcomes and the values assigned to 

them (according to the number of upturned heads) 

are given in the adjoining table: 





It is clear that Y takes the values 0,1,2,3 because in three tosses of a coin (or 
three coins are tossed at one times) there are 8 possible columns and their detail is 


No head (all tails) (T T T) 

One head (H T T, TH T, T T H) 

Two heads (HH T, HTH, THH) 

Three heads (H H H) 

No head (all tails) can occur only once so, the probability of no head is 1/8 

according to the definition of probability. 

One head can occur 3 times so, the probability is 3/8. Two heads can occur 3 
times so the probability is 3/8. Three heads can occur once so the probability is 1/8. 


We can write P(¥=0), P(Y=1), P(Y=2), 
P(¥’=3) read as probability of Y equals to no 
head, one head, two heads and three heads 
respectively. So, the probability function is 
given in the adjoining table: 
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8.3 Probability Density Function | 


The probability density function of a continuous random variable Y is 
specified by a smooth curve such that the total area under the curve is unity. The 
probability that Y falls in any particular interval is the area under the curve against the 
interval. 


Consider the weight (in kilograms) of a student in a class of 30 students taking 
Statistics. The weights measured to the nearest hundred of a kilogram are 60.50, 
60.80, 55.40, 53.70, 50.75,...., 45.00 and 49.80. The minimum weight is 45.00 and 
maximum is 60.80. In this situation the probability histogram approaches a smooth 
curve. The area under the curve is unity and it cannot go below the horizontal scale. 
The probability that the weight is between 55 and 57 kg is the area under the curve 
and above this interval. 


Let a and b be two numbers and Y is the random variable. Define the 
following events: 


i) a < Y < bis the event that the value of Y is between a and b. 
ii) Y < ais the event that the value of Y is less than a. 
iii) ~ Y> bis the event that the value of Y is greater than b. 


8.4 Simple Univariate Discrete And Continuous 
Distributions 


The probability distributions of the discrete random variables are represented 
in a tabular form by the values of the random variable 
and the corresponding probabilities. For example, when 
a die is thrown then each upturned face (1,2,3,4,5 and 6) 
has the same probability of 1/6 of its occurrence. Thus 
the probability distribution in the tabular form is given 
in the adjoining table. 





This probability distribution can be expressed in the form of the following 
formula such that the probabilities P(Y=y) can be expressed by the function f(y) 


wu 


This is probability distribution for the number of upturned points when a die is 
thrown. This is known as discrete uniform distribution. It should be noted that every 
function defined for the values of a random variable cannot serve as a probability 
distribution unless it satisfies the condition given under 8.4.1. 


fiy) = 1/6 for y=1,2,3,....,6 


= () otherwise 


The simplest form of the continuous distributions is the continuous uniform 
distribution prepared by. 


S(y)= I(b — a) a<y<b (8.1) 


Note that y takes the values between a and b. 
If a=0 and b= 1 then 


ty) =1 O0<y<l 
= 0 otherwise. 
This function is shown in figure 8.1. 


f(y) 
| 


0 
0 I : 
figure 8.1: - Continuous uniform distribution 
The area under the probability density function is also 1. So, the width of 
rectangle is unit interval with height 1. To calculate the probability, area under the 


curve can be calculated which would be the required probability. Very often these 
distributions are based on the empirical evidence or prior knowledge. 


It should be clear that in case of discrete distributions, the probability of an 
event is obtained by inserting the value of the random variable in the probability 
function but in case of continuous distributions, the probability is obtained by 
calculating the area under the curve and above the interval on which the event is 
defined. 
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It is to be noted that P probability of Y greater than P or equal to a and less 
than or equal to b. written as P(a < Y <b) is not equal to P(a < Y < b) in case of 
discrete distributions because the probabilities at a and b are not included. In case of 
continuous distributions these probabilities are equal because area at a point is always 


zero so the probability at a and at b is always zero i.e., 


zero. So, the probability at a and at b is always zero i.e., 


8.4.1 Properties of Probability Mass Function And Probability Density 
Function 


Probability Mass Function 

Let Y be a discrete random variable and P(y) be its probability mass function. 

The P(y) must satisfy the following two conditions 

i) VU < P(y) < | for each possible value of Y. 

i.e., the probability is a number between 0 and 1. 

il) % PO) =1 

ie., the summation over probabilities for all possible values y of the random 
variable Y should add up to 1. 


Example 8.2: A committee of size 3 is to be selected at random from 3 women and 5 
men. Obtain a probability distribution for the number of women selected for the . 









committee. 
Solution: 
Women Men 
BE [a1 ile na 
Number of women = 3 Number of Men=5 so total = 8 


; 8 
Number of selected persons =3, So total number of sample points = 5) = 56 


Let X be the number of women in the committee, these can be 0,1,2,3 and 
their respective probabilities are 


ea ik o 
0 }\3 ey 30 
P (no woman) = , P (one woman) = i) e 
8 8 56 
3 3 
315 (35). 
Gull}. 15 C3500) 9 od 
P (two women) = = —, P (three women) = eal 








Example 8.3: From an urn containing 4 red and 6 white round marbles. A man 
draws three marbles at random without replacement. If X is a random variable which 
denotes the number of red marbles drawn, what is the probability distribution of X? 
Solution: 


Number of red marbles | Number of white marbles total marbles = 10 
mee 4 = 6 


1 
Number of sample points = ) =i 20) 


If the random variable X denotes the number of red marbles, the possible 


.values of X are 0, 1, 2, 3 with their respective probabilities as: 
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4Y6 4Y6 
lola " 4 

P(X=0)= <4 ==, P&=l)= i -> 
a °) 








Example 8.4: Given the discrete probability mass function: -. 


4 1 x 1 4-x . 
ra} = — for x =0, 1, 2, 3,4 
pe 2 


Find probability distribution. 


Solution: 
P(x) -(" me be For x = 0, 1, 2, 3, 4 
x 2 2, 
POS ile ep GD cy AY (1) (Tas 
mvc (6) (3) (3) vero (t} (2) (2) 





; 


The probability distribution of X in tabular form is: 
Xx 0 1 y 3 a 


1/16 4/16 6/16 4/16 — 1/16 


Example 8.5: A random variable X has the following probability distribution: 








Find: 


ik ii) P(X<2) iii) P(X>2) iv) P(-2<X<2) v)P(X< 1). 


3 
Solution: Since bd P(x) = 1 gives 


x=-2 


0.1+ k+0.2+2k+0.34+ 3k=1 0r0.6+ 6k=1 or 6k=1-06=0.4 


so that PR AeA 
ee Re 
PI¢<2) = P(x=-2) + P&=-1) + P(x =0) + P (x =1) 


=0.1+k+0.2+2k=0.3 + 3k=0.34+3 (75 |=03+02-05 


P(x >2) =Por=2)+ Pr=3)=03+3k=03+3(5]=03+02=05 
P(-2<x <2) =P (x=-1) + P(x=0) + P(x=1) =k +024 2k=0.243k 
=0.2+3(+\-02+02=04 
15 
P(x<1)) -=P(x=-2)+ P(x=-1) + P(x =0) + P(x=1) 


=0.1+k+0.2+2k =0.3+3k =03+3( 4 )<0 3+0.2=0.5 
15 : f ; 
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Example 8.6: Check whether the function given by 


fy) = a for y = 1,2, 3,4 





= QO otherwise. 
is a probability function? 
Solution: Here, 


fC) =2/14, f(2) =3/ 14, f(3) =4/14 and f(4) =5/ 14 
Since the values are all non negative and add up to | as 


2/144+3/14+4/14/+5/14=1 


So, both conditions are satisfied, concluding thereby that the function is a 
probability function. 


Probability Density Function: _ 


Let Y be a continuous random variable and f(y) be its probability density 
function. Then f(y) must satisfy the following conditions. 


i) f(y) = 0 for all y it) P(-0.<Y<o)=1 
It means that the total area under the curve should be 1. 


Of course, not every function defined for the values of a random variable can 
- serve as a probability distribution unless it satisfies the above two conditions. 


Example 8.7: Verify whether the function 


fo)=1, Oxy 
= 0 otherwise 0 
is a density function? 0 1 


Solution: It is density function of a continuous random variable because y takes all 
values between 0 and 1. To calculate area we have width of rectangle as 1 and height 
of rectangle as 1, so ; 


Area = (width) (height) 
=(1) (1) =1 


\ The function is also positive, thus both conditions are satisfied and we 
- conclude that the function is a density function. 





8.4.2 Applications 


Once the probability distribution for a random variable has been defined. very 
often, it becomes easier to calculate the probabilities. In case of discrete random 
variables, probabilities of the events are obtained by adding the corresponding 
probabilities but in case of continuous random variables the probability that a random 
variable falls in a certain interval is computed by calculating the area above that 
interval. 


Example 8.8: The following table gives the probability distribution for the number of 
courses enrolled during spring semester 1995 by 50 M.Sc. Statistics students. 





i) Find the probability that 


a) A student enrolled 3 courses? 
b) A student enrolled less than 3 courses? 
Cc) A student enrolled atleast 4 courses? 
d) A student enrolled at the most 4 courses? 
e) A student enrolled between 4 and 6 courses inclusive. 
ie., P(4 < Y <6) 
pe Student enrolled between 4 and 6 courses? (exclusive) 
Solution: 
a) The probability that a student enrolled three courses is 
P(¥=3) = 0.16 ; 
b) The probability that a student enrolled less than 3 courses means 


by probability that he enrolled 1 course or 2 courses i.e., 
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P(Y = 1}+ P(Y = 2) = 0.02 + 0.03 
= 0.05 
c) The probability that a student enrolled atleast four courses means the 
probability of 4 or more courses i.e., | 
P(Y =4) + P(Y=5) + P(Y=6) + P(Y=7) 
= 0.40 + 0.25 + 0.16 + 0.05 


= 0.86 
d) P(Y <4) = P(Y=1)+ P(Y=2) + P(Y=3) + P(Y=4) 
=.02+0.3+0.16+.40=0.61 
e) P(4s yS6) = P(Y=4) + P(Y=5) + P(Y=6) 
= 0.40 + 0.25 + 0.16=0.81 
f) Between 4 and 6 there is only one numberi.e., 5, so 
P(Y¥=5)=0,2 


Example 8.9: The amount of time (minutes) taken by a doctor to attend a patient is 
between 5 to 10 minutes. If we assume that the distribution followed is uniform, then 
calculate Probability that doctor 


i) Takes between 6 and 8 minutes 
ii) Less than 8 minutes. 


Solution: Let Y denote the random variable. The amount of time taken by a doctor to 


attend a patient. 
Here the value of a= 5, b = 10, so probability distribution is 
f (y) = 1/(b-a) a< y<b 
=15=0.2 5<y<10 © 


i) The P(6<y < 8) is the area between 6 and 8 and is shown in figure 8.2. Width 
is 8-6 = 2 and height is 0.2, so 





(, 


5 10 
Fig. 8.2 The area between 6 and 8. 
P(6<Y <8) = (width of rectangle).(height) 
(2).(0.2) = 0.4 
ii) P(Y <8) =P(5<Y<8) 


= (width of rectangle). (height ) 











= (3).(0.2) = 0.6 
Example 8.10: A continuous random variable X that can assume values between 2 
and 5 has a density function given by: f(x) = a 2) 
Find 
i) P(x <4) ii) P(3.<x< 4). 
Solution: We are given that: 
)- . Pix<4)? 10/27 
Poa Bex 5 5/27 
p | 
2 (1+2) 6 
2) = = 
iby 27 27 
Se Ger sie 10 
(4 = = i 
oe 27 27 
Base =4-2 =2 
P(X <4) = (Sum of parallel sides) te 2 


. 
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TL@D*L@ gee 
2 
6 
J Bd. ek oe 1 
2 27 
(ii) P(3<X <4)? 
2(1+ 3) 8 
3) = = oon 
FO) 27 27 
10/27 
2(1+ 4) 10 8/27 
4) = = paees 
F@) 27 27 
Base = 4-3 =1 pe Lolo 
P3<x<4)= LOFLO Base 
8 10 
-——- + —— 
QE. ay ame | 
=< <~L=— = = 
2 54 3 
Example 8.11: 
i) A continuous random variable X has a density function f(x) = 2 x 
when 0 < x < 1 and zero otherwise. Find 
1 1 1 
P| X <=—|, ii) P| —< X <— |}. 
(a) 7) (ii) 4 5 


ii) If f(x) has probability density kx’, 0 < X < 1, determine k and find the 
probability that >< xX< 5 
Solution: 
i) Fes) = 25, 052-21, 


=(). otherwise. 


1 1/2 1/2 % iy? ti 1 2 K Sa 
a) P(x <3) J f(a) de = farae=2 |=] = @ 0 a 


1 1 1/2 1/2 rah ee 
b) P(i<e<y fp lfeare J 2xdx=2 
4 2 1/4 1/4 


-[s)-Whea 


1/2 
ii) f(x) will be a probability density function,if [ f(x) dx=1,ie., 


1/4 
| é a 1 \ ; 
eh eee eee ee =r (t-o|- k 
0 0 3 , 3 3 
So 1=k/3 >k=3 
Hence the probability density function f (x) = 3 7,0<X<1 


Now 


1 1 1/2 1/2 2 ln 
Pl-<x<—|= Jf f@dx= f 3x* dx = 3 
3 ad YE 1/3 ls 


1/3 


= |(2) aft). -[1-2]-% 
2 3 8. 27) a6 
Example 8.12: A continuous random variable x has probability density function. 
f(x) = cx for) <X <2. Determine 
i) c, ii) P<X<1.5 ) iii) P(X < 1.5) 


Solution: f(x) =cx, 0<X<2 


i) We know that Area = oe Base = 1 


f(0)=0, f(2)=2c 
Base =2-0=2 
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base x height 
2 


£0) + FQ) 
2 


<x Base = 1 





an 27) x2 = 


— 


—~ Oe | or c= = 


Nw 


f(x Ss st, 0<X<2 


il) Pe.) = 


1 1.5 

—= 1.5) = —=0.7 

5 O35, FAS 5 0.75 

Base = 1.5-—1=0.5 

fart Ju 
2 


fC) = 
Pi = Nie 1 A). x Base 


_ 0:5 + 0-79 9 5 -0,3125 


ili) PX <1 Sh 
f(0) =0, f(1.5) = = = 0.75 
Base = 1.5 -O=1.5 


P(X <1.5)= LO TES) Base 


Oe 73 15 = 05625 


0.75 





to 


lo 






i) a ii) P(3<X<5) 
Solution: fo) = a(x+3) 2<X<8 
i) We know that 


(ii) 


Example 8.13: A continuous random variable X, which can assume values between 
2 and 8 inclusive has a density function given by a (x + 3) where a is a constant then 
find: 


x Base = 1 


prea = £0) + fH) 


f(@ = f(®)=Sa 
f(b) =f (8) =1la 
Base =8-—2= 6 








A2)+ FB) Base =1 
2. 
Sat lla 8, 
or 48a =1 
1 
or a= — 
48 
~f@= =, 2<X<8 
P(S<X <5) = ? 
2+3 .6 
3) = ~— =. 
FG) 48 48 
5+3 ‘8 
4) = = 
f©) 48 48 


P3B<xX <5) FOLIO (5 


















10/48 





2 4 6 8 
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| & 8 | 

— +— 

48 48 14 
= x2 =— 


2 48 
8.5 Drawing of Probability Mass Function and 
Probability Density Function 


The Probability Function: The probability functions can be presented graphically 
in the following two ways: 


i) Probability Histogram 


To draw probability histogram, we take values of the random variable along 
x-axis and probabilities along y-axis. Adjacent rectangles are drawn against each 
value such that the height of each rectangle is equal to the probability at that point and 
the width of each rectangle is one taking 0.5 units to the left of value and 0.5 units to 
the right of value. Since, the width is one unit so the area of the rectagles is equals to 
their probabilities. The advantage of drawing probability histogram is that the 
discrete probability distribution can be approximated by a continuous curve. 

Example 8.14: Consider the following 


probability distribution and draw 
a probability histogram. 





To construct a_ probability 
histogram, the first step is to mark 














values of the random variable ie., 0.40 - 

0,1,2,3 and 4 on x-axis as mid points. 0.355 

The second step is to draw adjacent coe | 
rectangles of width 1 on each point 0.20 iat 

going half way to each side from the By : BY o 

mid points and with heights such that 0.05 + | 

the heights represent the re elie 2 iat 4 5 
corresponding probabilities of 0.08, Figure 8.3: Probability Histogram 


0.26, 0.34, 0.23 and 0.09 respectively. 
It is shown in figure 8.3 


ii: Bar Chart 
A bar chart is drawn with the values of the random variable along x-axis and 
probabilities along y-axis. The height of each bar equals the probability of the 


corresponding value. 


Data of the above example is taken to explain the procedure. The values of 
the random variable are 0, 1, 2, 3 and 4. As a first step these are taken along 
x-axis. The second step is to draw bars 


against these points with a height equal Probability 

to the corresponding probability. The seh 

probability corresponding to 0 is 0.08 035 

so a bar is drawn on 0 with height 0.08: 0.3 

a bar of height 0.26 is drawn on point te 

1; a bar of height 0.34 is drawn. on 0.15 

point 2; a bar of height 0.23 is drawn 

on point 3 and a bar of height 0.09 is wa 

drawn on point 4. It is shown in figure 1 2 3 2 5 
8.4 


Fig 8.4: Bar Chart 
Probability Density Function: 


If values of the random variable are very close to each other then the 
probability histogram of the discrete random variable can be approximated by a 
smooth curve. So, the probability corresponding to each interval on x-axis will be the 
area under the curve. This is the situation in case of continuous random variable. So, 
the probability density functions are presented by smooth curves and the probabilities 


of the curves are calculated by computing area under the curves corresponding to the 
events. 


8.6 Expectation and variance of the simple discrete 
random variable. 

Expected value: Let Y be a discrete random variable with probability function P(y). 

The mathematical expectation or expectation of the discrete random variable Y. 

denoted by E(Y) is defined by: 


E(Y) =z y P (y), sum is over all values of Y (8.2) 
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This expectation of a. random variable Y is the mean of its probability - 


distribution i.e., E(Y) is an alternative notation for the population mean py. Similarly, 
EY’) =a yP(y), sum over all y. (8.3) 


Variance: 


The variance of a random variable Y is the variance of the probability 
distribution and is 


o” =E(Y¥- py)’ == (¥- pw)’ PC) for all Y (8.4) 
Where o” denotes variance. 


The variance of a random variable Y is the variance of its probability 


distribution. It should be noted that the variance is expected value of (Y - y)’. 


We know that E(y)=ZYP(Y)= uw so 


o =E(Y-n) =D (py PCy) 
=2(Y' +p -2y Y) PQ) 
=LY’P(y)+w UP(y)—2uz YP(y) 
=D Y?P(y)+y—-2w (asi P(y)=1andz YP(y)=p) 
==(Y’)P(y)-w 
=LY’P(y)-(2YPO)Y 
=E(Y’)-[E(Y)Y 


8.6.1 Properties of Expectation: 


i) If c is a constant, then 
Etos=e 
ii) If a and b are constants and Y is a random variable, then . 


E(bY¥+ajy=bE(Y) +a 


Ifa= —pand b=1, then E(Y - pw) =0 


iti) If X and Y are two random variables, then the expected value of their sum is 


the sum of their expected values i.e., 
E(X+Y)=E(X)+E(Y) 

For the difference of two variables, the following result holds, true 
E(X-Y)=E(X)-E(y) 


iv) If X and Y are two independent random variables, then the expected value of 


their product is the product of their expected values i.e., 
E (XY) = E (X) E(Y¥) 


Example 8.15: The staff of the Department of Mathematics and Statistics at 
university of agriculture Faisalabad reckon that the number of microcomputers 


getting out of order in a year is well approximated by the following probability 
distribution. 


Find the average and variance of the number of computers gétting out of 
order? Also find the E( 5 — 3Y), by using the properties of = eae 










- Solution: Average = E (Y) = XyP(y ) 
Thus p= ZY, P(y,) =14 


0.3. | 0.0 






Variance = ),- E(Y)F PG) 
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Calculations for the variance are: 




















y-EQ) | b-EQ))’ | Ly- EQ) PO) 
















1.96 0.588 

-0.4 0.16 0.048 

0.6 0.36 0.072 

1.6 2.56 0.256 

.. 2 6.76 0.676 
a ee oe || 


thus 
Variance = 1.64 
Using property ii) E(5 — 3Y) = 5-3 E(Y) we have, E(Y) = 1.4, so 
E (5 -3Y) =5 -3 (1.4) 
=5-4.2 
= 0.8 


Example: 8.16 For the probability distribution of X given below find that; 
@) EC, i) E(x) 





Solution: : 


1 
2 
3 










es ere 
ee Le 
i 
nen MEL Bs 








11 
= =—=1.1 
EX) =Z.x?P |) 0 


mas 
10 : 
Example 8.17: A random variable X has the probability ‘distribution given below — 


E(X?) = Xx’ P(x) = oa 





Find i) E(X) (ii) E(3X + 5) ili) E(x’) 


iv) Show that E (3X + 5) = 3E(X) + 5 


STATE eels Sell aa Ea call 
Clie ea i ara Bll 
hia? 














110 =| 3/10 9/10 14/10 
Total |10/10=1| 11/10 | 21/10 83/10 


: = 
i) EQ ak Gr Teareke 


il) E (3X +5) = 2 (3x +5) P(x) = =83 ) 


ii)  E(X*) =X x P(x) -= =2.1 


iv) E(3X+ 5) =3E (X) +5 


83. = 3(1.1) +5 
Ge iene % 
Bais 88 
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Example 8.18: 4, B and C in the order cut a pack of cards, replacing them after 
each cut with condition that first who cuts a heart shall win a prize of Rs. 37. Find 
their respective expectation. 


Solution: Let P be the probability of getting heart = ; 


And gq be the probability of not getting heart = : 


A can cut a heart on Ist, 4th, 7th... drawing with respective probabilities. 


DP. P, 9° Ps i++ 

B can cut a spade on 2nd, Sth, 8th, .... drawing with respective probabilities. 
DP: 7 Pd P, + 

C can cut a spade on 3rd, 6th, 9th .... drawing with respective probabilities. 
TPT Ps P-- | - 


Then the probability that A cuts the heart is 
P(A) = P+q?pt+q‘pt..... 











1 
P(A) = oe P= 4 26.16» Expected amount for A=37x=0 =Rs.16 
1 l-q i 3 37 
4 
P(B)=qp+q'pt+q pt... 
2 li) 
P(p)= = 2. 4)\4 12 _Rs.12 
l-r Ala. 37 
1= =e 
4 
P(C)=q?ptqp+q pt 
3)\(1 
P(C)= oo = qT P eee PAG. mae he . Expected amount for C = 37x Rs.9 
Lah ar es 37 
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Example 8.19: A bag contains 6 Red and 4 white balls. A person draws 2 
balls at random without replacement being promised 15 rupees for each red ball and 


20 rupees for each white ball he draws. Find his expectation. 

Solution: Red balls = 6, White balls= 4, Total = 10, Drawn balls = 2 
Rupees for each red ball =d§ 
Rupees for each white ball = 20 


The respective probabilities of the drawing are: - 


blo] 
210 
P (2 red balls) = ACAC = 1° 

10 45 

2 

db 

‘ TE 24 

P (one red and one white ball) = ae 


| 4 | 6 | 
2 
P (2 white balls) = bs ee 

10 45 

2 
Hence the required expectations is 

15 24 6 

E = 30 | — |} +35 | — | +40 | — 
i ee ta 


= 10 + 18.67 + 5.33 
= 34 


6-|7-—x 
Example 8.20: If f(x) = a for X = 2, 3, 4,5, ...., 12 then find the mean and 


variance of the random variable x. 
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Solution: 


; 7T@) 
(0086 


30135 
146 
36/36 = 1 1974/36 


Mean =E (X) = Exf(s)===7 








Variance (X) = Xx’ f (x) — [Ex fix) ? 


2 
ig al cb ns 54.83 —49 =5.83 

36 

Example 8.21: Find the missing value such that the given distribution is a probability 


distribution of X. 





If Y= 2X — 8, then show that 


i) EQ) =2E-8 


ii) Var (Y) =4 Var (X) 


£99? 





Solution: We know that sum of probabilities is one 


Xf(X)=1 

0.01 + 0.25+0.4+ A+0.04 = 1 
A+0.7 =1 

A=1-0.7 

A=0.3 


FTI ork to L300 [70 
sa foe [aw fo 033 
a [ow ae [oa fe 


Cs [om is [780 
[08 oa Sia ef fie Fo oa 


E(X) =Exf(x) = 4.11 




















0.60 1.20 











E(X°) = 2 x? f(x) = 17.63 

Var (X) = 2 fix) — [Ex f= 17.63 — (4.11) =0.7379 
E(Y) = Lyfly) =0.22 

E(¥’) =ZY fy) =3 


Var (Y) = Ey? fly) - Eyf OF =3 - (0.22), =2.9516 | 


i) ‘ E (Y) =2E(X) —-8=2 (4.11) -8 = 8.22 - 8.00 
0.22 =0.22 ) 
i) .  Var(¥)=4Var(X) =4 (0.7379) 


2.9516 = 2.9516 
Example 8.22 


i) Given a random variable X with E(X) = 0.63 and Var (X) = 0.2331. 


ii) 


ili) 
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Find E (X’). 
Given that E(X*) = 400 and S.D. (X) = 12. Find E(X) 
Given the information that E (X)=200, C.V. (X) = 7%. Find Var (X). 


Solution: - 


a) 


b) 


Cc) 


8.9 


We have, Var (X) = E (X°)— [EQ] 


E (X’) = Var (X) +[E(X)} = 0.2331 + (0.63)? = 0.63 
We have, S.D. (X) = E(X”) — [E(x)f 
12 =./400 - [E(x)f 


Squaring on both sides, we get 
144 = 400 -[E(x)} 
or [E(X)} =400 - 144 = 256 


so that E(X) = V¥256 =16 


S.D.(X) 
E(X) 


x 100 





We have, C.V. (X) = 


7 = SDCO > i99 
200 





_ SD 
2 


S.D. (X) =7 (2) =14 
so that Var (X) = (14)? =196 


Distribution Function 


Very often we are interested in calculating the chances that the values of a 


random variable will remain at or below a certain fixed value. For example, what are 


chances that a student will get not more than 80% of marks? What are the chances 
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that 5 tosses of coin will produce not more than three heads? In these situations, we 
are concerned with the probability that a given random variable Y will take on values 
that are less or equal to some fixed value of Y. Mathematically, it is written as 
P(Y < y). This probability is called distribution function or cumulative distribution of 


the random variable Y. 


The probability P (Y < y) for the possible values of y is called distribution 
function (DF) or cumulative distribution function (cdf). It is usually denoted by 


F(x), so we can write. 
F(Y) = P (Y < y) over possible values of Y (8.5) 

In any density function the integral from - © to y is called a distribution or 
cumulative distribution function is given as: 


ce 
FY) = J fi) dy 


The function has the following properties: 
i) F(-0)=0 

ii) F(+0)=1 

ii) F(y,)<F(y,)if y, < y, 


(iv)  F (9) is continuous atleast on the right of each y. 


Example 8.23: A coin is tossed three times. Probability distribution for number of 


heads (1) is given below: 





Find distribution function F(y)?_ 
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Solution: From the definition of F(y), we have 


i) F (Y) = P(Y < y) and ¥ takes values 0, 1, 2, 3. No value of Y is less than 0. 
So, P(Y¥<0) =0 


il) for Y = 0, P (Y= 0) 1/8, there is no integral value between 0 and 1. So, this. 
probability remains 1/8 till Y approaches 1. 


iii) for Y=1,F (Y<1)=P(¥=0)+P(¥= D=s+s= - 
, ae ee wee | 
iv) for Y=2, F(Y<2)=P(Y=0)+P(Y= Dt eee cod ales tele 


It remains 7/8 until Y reaches Y = 3 


at Y= 3, F(Y <3) =P (Y=0) + P(Y=1) + P (Y%=2) + P(¥=3) 
= 1/8 + 3/8 + 3/8 + 1/8 =1 


So, this probability is 1 at Y=3 
As Y does not take any value beyond 3. So, There is no probability beyond 3, 


Thus F(Y > 3) remains 1. 
These results are summarized below and its graph is as in figure 8.5 


F( y)=0 for Y<0 
718 


=1/8, for0<Y<l én 
= 3/8, forl<Y<2 5/8 
=7/8, for2<Y<3 Fy) 48 
eo for S35 si 

2/8 
1/8 








Figure 8.5: Distribution function 





8.1 


8.2 


8.3 


8.4 


8.5 


8.6 





Exercise 8 | 8 k Ans on Page 258 


What are random numbers? How can they be generated? Explain the 
applications of random numbers. 


Three balls are drawn from a bag containing 5 white and 3 black balls. If X 
denotes the number of white balls drawn from the bag, then find the 
probability distribution of X. 


There are seven candidates for three positions of typist. Four of the 
candidates know Urdu typing while the other three do not know it. If the three 
candidates are selected at random, find the probability distribution of the 
number of persons knowing Urdu typing among those selected. 


i) What is meant by probability distribution. Distinguish between 
discrete and continuous random variables by giving examples. 


ii) A coin is tossed 4 times. If X denotes the number of tails, what is the 
probability distribution of X? Draw a probability histogram. 


ili) A bag contains 4 red and 6 black balls. A sample of 4 balls is selected 


from the bag without replacement. Let X be the number of red balls, 
find the probability distribution of X. 


i) Define probability function and give an example to explain it. 


ii) Two fair dice are thrown and Y denote the product of the two scores. 
Obtain the probability distribution-of Y. 


i) What properties a mathematical function should possess to be a 
probability function and probability density function? 


ii) Check whether the following functions satisfy the conditions of a 
probability function? 


a. f(y) =1/4 for  y=1,2,3,4,5 
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b. fQ)=1ly for y =1,2,3,4 
roo Ff =yAS for y = 0,1,2,3,4,5 
d. f(y) =6-y’)/6 for y=01,2,3 


8.7 Determine the value of c so that the function can serve as a probability 
function of a random variable. 
a) cy for Y= B2,3,4,5 
b) d-dc for 204.2... 


8.8 A random variable Y takes values 0, 1, 2, 3 with respective probabilities 
1/4 (1 + 360), 1/4 (1-8), 1/4 (1 + 26), 1/4 (1 — 46). For what values of @ is this a 


valid probability function? 


$39.2 Given the following probability distribution: 


| on ee ee ee 





1/126 20/126 60/126 40/126 5/126 


Verify that E (2X + 3) =2E (Xx) +3 
ii) Let X be a random variable with probability distribution: 
2 3 





. un and Var(X) b) The probability distribution of the random variable 
Pa2re 
Using the probability distribution of Y, determine E(Y) and Var (Y). 


8.10 The following table gives the probability distribution of the random variable 
Y, the number of courses taught by a teacher during spring semester in the 


University of Agriculture Faisalabad. 


8.11 


8.12 


8.13 


8.14 


8.15 





Find the probability that: 
i) Teacher taught 2 courses 
ii) Teacher taught less than 4 courses. 


iii) Teacher taught between 2 and 5 courses 

iv) Teacher taught atleast 3 courses. | 

v) Teacher taught at the most 4 courses. 
A box contains five slips of paper marked 1,2,3,4 and 5. Two slips are 
selected without replacement, list the possible values for each of the following 
random variables: | 
i) The sum of the two numbers on two slips. 


ii) The difference between the first and second number. 


Four randomly selected students from a class are asked their opinion about the 
teaching system as satisfactory (S) or not satisfactory (NS). Let Y denote the 


number of students saying satisfactory. Write down the possible outcomes 
and the possible Y values. 


A point is randomly selected on the surface of a lake that has maximum depth 
of 30 feet. Let y denote the depth of the lake at the randomly selected point. 


What are possible values of Y? Is Ya discrete variable or a continuous 
variable? 


Calculate mean and variance of the following probability distribution: 


Po CHRECHERESES 
roy [0a [0203 oar |or 


i) A continuous random variable X having values only between 0 and 4 









‘ , : 1 : : - 
has a density function given by: f(x) = es ax, where a is a constant 


find aja b) PU <X <2). 


8.16 


8.17 


8.18 


8.19 


8.20 
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ii) A continuous random variable X has probability density function 
giving f(x) = cx for 0 < X <2. Find 


(a)c (b) Probability that 1<X<1.5  (c) Probability that X< 1.5 
iii) If f(x) has probability density kx’, 0 < X < 1, determine its kind and 


find the probability that ; <X< : 


What is a random variable? Distinguish between discrete and continuous 


random variable, giving examples. 


Find the probability distribution of the number of boys in families with three 
children, assuming equal probabilities for boys and girls. 


From lot containing 12 items, 4 of which are defective, 5 are chosen at 
random. If X is the number of defective items found in the sample, write 


down 
(i) The probability distribution of X (ii) P(X < 1) 


‘ 
iii) Verify }°[P@]=1 

x=0 
i) Define continuous random variable and its probability distribution. 


ii) Find the constant k so that the function f(x) defined as follows may be 


a density function. 
f(x) = = a<X<b 


= 0, elsewhere 
(i) What do you mean by expected value? What are the properties of 


expectation? 


(ii) Given the following discrete probability distribution: 


eres Ae Pe oe 
6/36 6/36 | 4/36 | 2/36 


Compute its mean, variance, standard deviation and coefficient of variation. 


8.21 


8.22 


8.23 


8.24 


8.25 


8.26 


8.27 


Let X be a random variable with probability distribution as follows: 


Lt Papi aa ah yb eo | >| 





Ds 1 


Find mean and variance. 





i) A continuous random variable X has a density function 
fx) = < for X=2to X=4. Find 

a) P(X < 3.5) b) P(2.4<X<3.5) c)P(X=1.5). 

(ii) | Acontinuous random variable X has a density function 


fe) =2x, 0<x<1.Find 


a) P(X=5) b)P(X>2)~ oc) P(7<X <5) 
A continuous random variable X wi:ich can assume values between X = 2 and 
X= 8 inclusive has a density function given by a (x + 30), where a is a 
constant. Find i)a ii) P(3<X<6) iii) P(X <6) iv) P(X >4). 


In a summer season, a dealer of desert room coolers can earn Rs. 800 per day 
if the day is hot and can earn Rs. 200 per day if it is fair and loses Rs. 50 per 
day if it is cloudy. Find his mathematical expectation if the probability of the 
day being hot is 0.40 and for being cloudy it is 0.35 


A committee of size 5 is to be selected from 3 female and 5 male members. 
Find the expected number of female members on the committee. 


What are the properties of probability function and probability density 
function? 


i) Explain the concept of distribution function. 


ii) 10 vegetable cans, all of the same size, have lost their labels. It is 
known that 5 contain tomatoes and 5 contain corns. If 5 cans are 
selected at random, then find the probability distribution and the 
distribution function for the number of tomato cans in the sample. 
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8.28 i) Define the expected value for a random variable. 
ii) The number of automobile accidents in a city are: 1,2,3,4 with corresponding 


probabilities 1/8, 2/8, 2/8 and 3/8. What is the expected number of daily 
accidents? 


8.29 A and B throw a die for a prize of Rs. 11. Which is to be won by the player 
who first throws a 6. If A has the firs throw, what are their respective 
expectations? 

8.30 A bag contains 2 white and 2 black balls. Three men A, B and C draw a ball 
and don’t replace it. The person who draws the white ball first receives 


Rs. 12. What are their respective expectations? 


8.31 Three balls are drawn from a bag containing 5 white and 3 black balls. If X 
denotes the number of white balls drawn from the bag, then find the 
probability distribution of X. Also find its mean and variance. 


8.32 A coin is biased such that a head is thrice as likely to occur as a tail. Find the 
probability distribution of heads and also find the mean and variance of the 


distribution when it is tossed 4 times. 


8.33 Approximately 10% of the glass bottles coming from a production line have 
serious defects. If two bottles are selected at random, find the expected 


number of bottles that having serious defects. 


8.34 A random variable X takes the values —3, -2, 2, 3 and 4 with probabilities 
P(X) equal to 1/5, 1/10, 1/10, 1/5 and 2/5 respectively. Compute E(X) and 
show that E. (5X + 10) = 5E(X) + 10. Also compute the variance (X) and 
variance (5X + 10). Find the ratio of two variances. 


fy 


= 
8.35. Jfif(x) = a for X = 2,3,4,...., 12, then find the mean and variance of 


the random variable X. 


8.36 


8.37 


8.38 


For the following Probability distribution. Find 
i) E(X), i) E(X), iti) E[IX-EQY. 





i) Define expectation of a random variable 
ii) The probability distribution of a discrete random variable X is given by 
3 1 x 3 3-x 
x)= —||—|. ,x=0,1,2;3 
ol IG) (4) 
Find E(X) and E(X*). 


Against each statement, write T for true and F for false statement. 


i) A random variable is also named as a chance variable. 


ii) | The number of accidents occurring on G.T. road during one month is 
the example of continuous random variable. 


iii) | The Probability cannot exceed 1. 
iv) The range of continuous Random variable is from ‘0’ to ‘n’. 
v) The Distribution function is an increasing function. 


vi) | The expectation of a Random variable is also named as the Mean of a 
Random variable. 


vii) The probability function can be negative. 


_ viii) _ A discrete probability distribution is represented by area graph. 


ix) IfXandY are independent random variables then E(XY)=E(X) E(Y). 


x) If X and Y are independent random variables, then 


S.D(X —Y) = S.D(X)-S.D(Y). 


Binomial and Hypergeometric 
Probability Distribution ~ 





9.1. Introduction 

~ Inan experiment of tossing a coin, drawing a card from a pack of playing 
cards repeatedly, etc., each drawing is called a trial. The results of each trial is 
classified as a success or failure. The probability of success is denoted by p and the 
probability of failure is denoted by g where, g=1-p or g + p = 1. Let an experiment be 
repeated n times, the number of successes obtained in a trial of the experiment is 
denoted by x and the number of failures by n—~x. 


A trial having two possible outcomes i.e., only success and failure is called 
Bernoulli trials. For each bernoulli trial, the probability of success remains the same 
and the successive trials are independent. 


An experiment in which the outcomes can be classified as success or failure 
and in which the probability of success remains constant from trial to trial is called 
Binomial experiment. A binomial experiment possesses the following properties: 


i) ach trial of the experiment results in an outcome which can be classified into 
two categories i.¢., success and failure. 

ii) The probability of success remains constant from one trial of the experiment 
to the next. 


iii) | The repeated trials are independent. 


iv) The experiment is repeated a fixed number of times. 


9.2 Binomial Probability Distribution 


Suppose we have n independent trials for each of which the probability of 
success is p and the probability of failure is q and g+ p = 1. The probability of 


exactly x success is given by 


P(X = o-(")p qh (9.1) 


where x = 0, 1, 2, 3,...., m. The random variable x is called the binomial variable 
and the distribution of x is called the binomial distribution, the quantities n and p are 


called the parameters of the binomial distribution. 


The binomial probability distribution is generally denoted by b(x, n, p). The 


probability ("ar ~“p* are obtained by expanding the binomial expansion (g + p)” 


Jan °p? + 7 Jar’ zy [sare + ot (Gene 


n 
0 
hes Z i: } are called the binomial co-efficients. 


1.€.; 

(q+ p)"= -( 
n 
2 


vies (Oli) 


If p= q=5. the binomial distribution is a symmetrical distribution. 
If p#q, the binomial distribution is a skewed distribution. 


If p >> , the distribution is negatively skewed . 


If p <; , the distribution is positively skewed . 


Example 9.1: A fair coin is tossed 4 times. Find the probabilities of obtaining 
various number of heads. 


Solution: n = 4, p = 


aa 


where x denotes the number of heads. 


re ae 
“P(X =0) = (0) G = 


Chapter 9 FB sn and Hypergeometric Probability Distribution 
ay(ly(1y 4 
P\2) (2); 16 

ee “EY /6 

2,2) (2) 16 

4-3 3 
P(X23) = (3) se Oe BSA 
32 2 ong 

4-4 4 
P(X =4) (3) LL df thahitaeedy 
4 }\ 2 2) TG 


Example 9.2: A fair coin is tossed 5 times. What is the probability of getting 


P(X =1) 


P(X =2) 


i) Exactly3 heads ii) atleast3 heads iii) | atthe mosttwo heads. 


Solution: n=5,p = —,q = 


> 


Nile 
Nl Re 


n —xX Box 
and P(X =9-(" hr PP. £220, 1, 2;3,4,5 


i) Exactly 3 heads 
5-3 3 
5YV 1 1 10 
P(X =3) = — = | =— 
a-9.° GQ) G)o2 
ii) at least 3 heads 


P(X 23) = P(X =3)+P(X =4) + P(X =5) 


Ol) Gla) GGG) @) 


10 5 1 
= —+—+ — 
326. Foe. BL 
=eeegs 
32 


iii) at the most two heads. 





eo 
P(X $2) = P(X. =2)+P(X =1) + P(X =0) 


68) JG) G)-Ca) G) 


iF inet ten 
92 hn SBi 32 

eee 
32 


Example 9.3: If you toss a fair die 6 times. What is the probability of getting no 


even number. 


J. 
2 


31 
63° 
6—0 0 
ra=0 = (6)(3) (a) 


9.2.1 Binomial frequency Distribution 


Solution: n =6. p= q= 


If the binomial probability distribution is multiplied by the 
number of experiments N_ then the distribution is called the binomial. frequency 


distribution. Expected frequency of x successes in N experiments of n trials is given 
by 


f(X=x) Np(X =x) =N "lar Pia 


Example 9.4: Out of 800 families with 5 children each. How many would you 


expect to have 
i)4 boys ii) at least3 boys iii) at the most one boy. 
Solution: n = 5, N =800, P= 


1 
.q=l-— p=— 
q Pp 5 


i) With four boys 
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n\n 
POS =x) = {i jatar 


P(X =4) 


ll 

=~ 
RN 
Nara 
Nl Re 
Re sy 

wa 

A 
aN 
Nie 
——- 


Hence expected number of families with 4 boys = 8002. = 125 


iil) with at least 3 boys 
P(X =3) = P(X =3)+P(X =4) + P(X =5) 


- OG) ECE) OTE) 


a 32... ae 
= — =05 
32 


Hence expected number of families with at least three boys 


= 800x2® = 400 
32 


iil) with at the most one boy. 


P(X $1) = P(X =1)+ P(X =0) 


eey Eye 


o 1 
= —+ — 

52 Se 
2 

32 


Hence expected number of families with at the most one boy will be 
~ g00x2- = 150 
32 
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Example 9.5: Five dice are tossed 96 times. Find the expected frequencies when 
throwing of a 4, 5 or 6 is regarded as a success. 


Solution: Here, 





Expected Frequencies 
NP (=f 


96x Jae 
2 












9.2.2 Mean And Variance Of The Binomial Distribution 


We find the mean and variance of the binomial distribution given by 
(q+ p)'. We know that 


Mean = E(X) = > xP(x) (9.2) 


Variance = E(X*)-[E(X)/ 


= x? P(x)-[¥x P(x)P (9.3) 
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The necessary calculations are shown in the table given below: 


n-2 


2n(n—1) p’q 








Peed) 


EQ) =) > 4) 





24 n(n—1)(n—2) p> n-3 


n 


= npq"'+n(n-l)p?q"” - q” > +....+np 
a n—2 n—1)(n— 2) 2 an- n— 
= mo a" +00—D pa 2 a pg fa 7: 
= np\(q+p)""| 
=np(q+p)"' |. ptq=l 
 E(X) = np 
E(X*) = Dix? P(x) 
nl 2. n-2 3 n(n—1)(n—-2) 3 n-3 Qin 
=npq +2n(n-l)p q Pt eee Ge: Passer Pp 


240 
n-3 n-1 
3a=Win-2) Din) ae + ....4) | 
2! 
n-3 n-1 
* DOD BG et 


= {cn Pee aoe Sian + wit “Dp. | \ 


n-2 
=m) +2(n=D pg. + 


=n) {a +(n- pq - 


E(X*) =npl(q+ p)"'+(n—1)p{q"? +(n-2) pq" +....+ p”)| 
E(X”) =np|(q+ p)"'+(n-) p(q+ py" | 
=np[1+(n-1)p | 

E(X*)=np +n’ p* —np* 
Variance = E(x”) —[E(x)f 

= np+n’ p*—np* —(npy’ 

=np—np° 

=np(l1—p) Variance = npq (.q=1-p) 
And standard deviation o = Jnpq 


Example 9.6: In a binomial distribution n = 20 and Pas. Find the mean, variance 


and standard deviation of the binomial distribution. 
Solution: Here n =20, p ==. q =1-p= = 
we know that 
Mean = np 


Oro =) $Dea= Vnpg=/48 
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Example 9.7: In a binomial distribution, mean and standard deviation were found to 
be 38 and 5.6 respectively find p and n. 


Solution: 
Mean = np = 38 (i) 
Standard deviation = npq = 5.6 (ii) 


Squaring equation (ii) 


npq = 31.36 (iii) 
Dividing equation ii) by i), we get 
npq _ 31.36 
np 38 
q = 0.83 
* Profueding 
OF pe ie SS 
Pe SET 
Putting the value of p = 0.17 in equation (i) 
npi= 38 
n(0.17)= 38 
n = 224 ; 
n = 224and p=0.17 


Example 9.8: Is it possible to have a binomial distribution with mean = 5 and S.D. = 4. 


Solution: 
mean = ap =D (i) 
S.D. = .Jnpq = 4 (ii) 
Variance = npq = 16 _ (ili) 
npq 16 
wi 5 
q = 3.2 


Since p or q cannot be greater than one. So it is not possible to have a 


binomial distribution with mean 5 and with standard deviation 4. 
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9.3 Hypergeometric Distribution and Hypergeometric 


Experiment 

When the successive trials are without replacement hes they are dependent 
and the probability of success changes from one trial of the experiment to the other. 
Such an experiment in which a random sample is chosen without replacement trom a 
finite population is said to be hypergeometric experiment. 

9.3.1 Properties 

A hypergeometric experiment has the following properties: 

i) The experiment is repeated a fixed number of times. 

ii) The successive trials are dependent. 

iii) The probability of success varies from trial to trial (it in not fixed). 

iv) The outcome of an experiment can be classified as success or failures. 

The random variable X representing the number of successes in a 
hypergeometric experiment is called a hypergeometric variable and probability 
distribution of the hypergeometric variable is called hypergeometric distribution. 
9.3.2, Hypergeometric Probability Distribution: 

Suppose that there are N total number of items out of which k are classified as 
successes and (N-k) as failures. ‘n items are to be selected at random without 
replacement it < N. 

Let X-denotes the number of successes and we can obtain exactly “ 
successes and (n-x) failures as follows: 


kYN-k 
xi} n-x 
P(X =x) = (9.4 
( ) N (9.4) 
n 
where x = 0, 1, 2,.... , . The hypergeometric probability distribution has 3 
parameters n, k and N. 
Mean of the hypergeometric distribution -— ees.) 





N} N-1 
Example 9.9: Determine the probability distribution for the number of white beads. 
Among 5 beads drawn at random from a bow! containing 4 white and 7 black beads. 
Compute mean, variance and check the results by using the formula. 


variance of the hypergeometric distribution = ae - ei oS 
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_ Solution: 
Nees Bs Soman § 
Taking white beads as success 
DTK 
N-k = 11-4 = 7 (black beads) 
Xie OE ee 

k \N-k 

Pty =3)-= x \n-x 





a 
p(x =0) = t°ARD. MI) _ 21 


4Y7 

PX = a= feel: (4)(35) _ 140 
4\(7 

Pix 229-2 (la). (6)(35) 210 


) 462 462 


4\7 
me = 3). = bl 2}, wen 20 


11 462 462 











a 
Probability distribution is given below: 


en 
cee | a eT 
ea a 


Mean= E(X) = xf (x) 


840 
462 


= (1.8182 
E(X*)-[E(X)P = Ex f(~)-[Ex fol 


_ 1848 (840 
462 



















Variance 


462 
= 0.6942 
Checking The Results 
nk 5x4 
Mean = ——=——=1.8182 
N 11 





= sisal 35 | 


Variance = 0.6942 
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Example 9.10 Ten vegetable cans, all of the same size, have lost their labels. It is 
known that 5 contain tomatoes and 5 contain corns. If 5 are selected at random, what 


is the probability that all contain tomatoes? What is the probability that 3 or more 
contain tomatoes? 


Solution: N = 10; 


= a 
considering the tomatoes cans as success. 

k = 5 (Tomatoes) and N—k=10-—5=5 
i) All contain tomatoes. 


2 Gea — te 


n-x = 5-5 = 0 


k\N-k 
x il n-x 
ELA =a. = 
(x =2) (lacs) 
n 
in| be) 
5 }\0 
PPK S6y = = 0.00397 
10 
5 
ii) 3 or more contain tomatoes 
eiriceeT vans 


MeKenwes 2, dyP 
P(X>3) = P(X =3)+ P(X =4) + P(X =5) 


5/5 5\(5 5\5 
3R2) (4), (Sho 
= 10 10 10 
5 5 5 
ee a Oe : 
Bh 9 Sa y Seley» 
See vs 
fie ©. 


Serex 23)" = 05 


Exercise 9 Ans on Page 260 


9.1 
9.2 


9.3 


9.4 


a5 


9.6 — 


a7 


9.8 


59 





Define the Binomial Probability Distribution. 


What is a Binomial experiment? Give its properties? 


An event has the probability p = 7 Find the complete binomial distribution 
for n = 5 trials. 


Find the probability ie., tossing a fair coin four times there will appear 
i) 4 heads ii) | tail and 3 heads 


ili) at least 2 heads iv) at the most 2 heads. 


If 20% of the bolts produced by a machine are defective, determine the 
probability that out of 4 bolts chosen at random 


i) zero ii) 2 bolts are defective. 


Given that the probability of passing an examination is 0.75. What is the 
probability of 


i) passing at least two examinations if you take six ? 
ii) failing at least two examinations if you take four? 


The experience of a house agent indicates that he can provide suitable 
accommodation for 75% of the clients who come to him. If on a particular 
occasion 6 clients approach him independently, calculate the probability that 


i) less than 4 clients will get satisfactory accommodation. 
ii) at least 5 clients will get satisfactory accommodation. 


The incidence of an occupational disease in an industry is such that the 


workers have 20% chance of suffering from it. What is the probability that 
out of 6 workmen: 


i) not more than 2 will catch the disease ? 
ii) 4 or more will catch the disease ? 
If 8 coins are tossed what is the probability that there are 


i) Exactly 5 heads ii) | to 7 heads. 


9.10 


9.11 


9.12 


9.13 


9.14 


9.15 


9.16 


9.17 


9.18. 


9.19 


9.20 
9.23 
9.22 
9.23 
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If X = binomially distributed with n= 10,. P=0.4, then find Mean and variance 
of Y= X -10 





If X is the number of successes with probability of success as . in each of 5 


independent trials. Find 
i) P(X=0) li) P(X <3). 
If 5 true dice are thrown once, determine the probability of getting 0, 1, 2, 3, 


4, 5 sixes. Find the mean and variance of the probability distribution so 
obtained. 


Five dice are tossed 96 times. Find the expected frequencies when igen: 


of a4, 5 or 6 is regarded as success. 


Four dice are thrown and the number of sixes in each throw is recorded. This 
is repeated 180 times. Write down the theoretical frequencies 
0,1,2,3 and 4 sixes. Calculate the mean number of sixes in a single throw. 


The mean and variance of the binomial distribution are 6 and 2.4 respectively. 
Find p and n, the two parameters of the binomial distribution. 


If the probability of a defective bolt is 0.1. Find the mean and standard 
deviation of the distribution of defective bolts in a total of 500. 


If in a binomial distribution, the mean is 3 and the standard deviation is 1.5. 
Find its parameters. 


Is it possible to have a binomial distribution with mean = 5 and 
SD. = 3:7. 
In binomial distribution with n=5, what is the value of other parameters of 


the binomial. If P(X = 0) = P(X = 1) find mean of the distribution. 
Discuss the statement that in a binomial distribution = 6 and o = 2.5. 


Find the binomial distribution whose mean is 12 and standard deviation is 3. 
Find the mean and variance of the binomial (gq + p)’. 


Find the mean and standard deviation of the Binomial distribution 


(q+p)" 


9.24 


9.25 


9.26 


9.27 


9.28 


9.29 


ae 


i) Mean of the binomial distribution is and its variance is 


Fill in the Blanks. 


ii) Binomial distribution is symmetrical when 


iil) Binomial distribution is used when n is 


iv) Binomial distribution has parameters. 
Vv) Binomial distribution is positively skewed when 
vi) Binomial variable is a variable which can assume any 


of the values X = 0, 1, 2, 3, ...., n. 
vii) | The shape of binomial distribution depends upon the values of : 


viii) In a binomial distribution, the experiment consists of a 
number of trials. 


ix) Mean, median and mode for binomial distribution will be equal when 


Five balls are drawn from a box containing 4 white and 7 black balls. If X 
denotes the number of black balls drawn, then obtain the probability 
distribution of X. Find the mean and variance of this distribution and verify 
the results using the formula. 


Determine the probability distribution for the number of white beads among 5 
beads drawn at random from a bowel containing 4 white and 7 black beads 
use this distribution to compute the mean and variance. And check the results 


by using the formula. 
A committee of size 3 is selected from 4 men and 2 women. Find the 


probability distribution by hypergeometric experiment for the number of men 
on the committee. 


A committee of size 5 is to be selected at random from 3 women and 5 men. 
Find the expected number of women on the committee. 


Ten vegetable cans, all of the same size have lost their labels. It is known 


that 5 contain tomatoes and 5 contain corns. If 5 cans selected at random what 
is the probability that: 


i) all contain tomatoes. (ii) 3 or more contain tomatoes. 
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9.30 Write T for true and F for false against each statement. 


i) 


ii) 


iii) 


iv) 


v) 


vi) 


” vii) 


Viii) 


ix) 


x) 


xi) 


xii) 


xiii) 


An experiment is called Bernolli experiment if it has two possible 


outcomes. 
The binomial distribution has three parameters n, p and q. 


A paper has 10 multiple-choice questions with 3 alternatives. 


Answering these questions by guesswork is a binomial experiment. 
If p #q, then the binomial Probability Distribution is skewed. 


A binomial random variable is a continuous random variable. 

The binomial distribution is symmetrical distribution if p=q= ; : 

The binomial distribution is negatively skewed distribution if p>q. 

The trials are independent in the hypergeometric distribution. 
Hypergeometric probability distribution has three parameters n, N and k. 


The Binomial Distribution is used when ‘n’ is large. 


_ The variance of the Binomial Distribution is ‘npq’ 


The binomial random variable cannot assume the negative values 


The binomial distribution is positively skewed distribution if p<gq 


Exercise 1 


1.9 


1.10 
Exercise 2 


2.8 


2.12 


2.17 


2.18 


Exercise3 
; 3.2 
3.3 





Rico = ober o | 
Je fs UA Beh 
i) Population it) less iii) parameter iv) Statistic 
v) constant vi) zcro vii) random viii) quantitative 
ix) Inferential statistics x) primary data 
i)T ii)F iii)F iv)T  v)T vi)F vii)F viii)T ix)T x) T 


ii. a) 3.45-—3.95 class boundaries of 3.7 


b) 3.45—3.945 class limits of 3.7 





St A PRE ORE 
Bot lemon 











17- 2.4 


2.5—-3.2 


3.3 -4.0 


41-4 


4.9—5.6 






5.7-6.4 


i) Process ii) four iit) lower, first, last 

iv) Proportional v) Cumulative frequency vi) Histogram 

vii) Vertical, no viii) Proportional ix) | Frequency polygon 
i)F ii) T iti) F iv)F v)F vi) F vii) T—sviii) T_~—six) F x) F 


Geometric Mean =47.5675 
Average 14.2614 km/h 





me se ia he 


3.4 
3.5 
3.6 
3.9 
3.10 
3.11 
3.12 
3.13 
3.14 
3.15 
3.16 
3.17 
3.18 


3.19 
3.20 


3.21 
3.22 
3.23 
3.24 
3.25 
3.26 
3.27 
3.28 
3.29 
3.30 


3.31 


3.32 
Exercise 4 

4.5 

4.6 


Mcan = 1.5 Rotten potatoes, Median = | Rotten potatoes, Mode = 1 Rotten potatocs 


Mean = 11.67, Median = 11.5, Mode = 12.8 
(i) Mean = 4.405, (ii) Median = 4.533 
(ii) Mode = 18 

Annual Percentage increase is 18.2% 

Mean = 74.5895 

Mean= 21.2, Geometric mean = 20.06 

Mean= 42.12, Harmonic Mean= 35.56 
Harmonic Mean= 50.51 

Harmonic Mean = 37.5 

Arithmetic Mean= 5; Numbers are b = 2,8 and a=2,8 
Threenumbersare: x,=3, x,=27, x,=72 
(i) Harmonic Mean=7.66 


(ii) (Geometric Mean) increase is 15.98% 


Mean= 11.09, Median=11.07 _Mode=11.06 
Mcan=0.7317, Median=0.7318, Mode=0.7319 

D,= 9.73192 P,,= 0.73203 

Q,;=31.01 D,=23.783 P,=6.0125 Mode= 23.28 
Mode=0 


Average = 24.003 

Harmonic Mean = 134.77 

Weighted Mean = 72.57 

Weighted Mean = 203.4 

Mean=18 because }(x-x)=0 -- S(x-18)=0 
Ist value is 30, 2nd valuc is 20, 3rd valuc is 10 


i)T ii) T iti) F iv) T v)F wi) T vii) F 


i) Arithmetic Mcan ii) Harmonic Mean 
iii) Arithmetic Mean iv) Median 
v) Geometric Mean vi) Arithmetic Mean 


i) Measures of Central Tendency ii) Extreme iii) Zero 
v) G.M_ vi) Equal vii) Mode viii)Bimodal ix) Identical 
i) F ii) F iti)T iv)F v)T vi)T vii) F viii)F ix) T 


iv) two equal 
x) Median and Mode 
53 ae 


ii, a) QD=5.41 b) Coefficient of SK =0.05 


M.D=5.73, CoefficientofM.D=0.09, Variance=47.855 
Range=30, Q.D=5.3, Coefficient of Q.D=0.08 


4.7 


4.8 


4.9 


4.10 


4.11 


4.12 


4.13 


4.14 


4.15 


4.16 


4.17 


4.19 


4.20 


4.21 


4.22 


4.23 


4.24 


4.25 


Q.D = 33.24, Coefficient of Q.D = 0,302 


nN 
' 


S.D = 44.687 Variance = 1996.928 C.V=9.914% 


Combined SD = 3.92 
Combined mean = 57.06, Combined SD = 8.75 


Y=2(X)+5 and S,=S, 


2.10 _110 

~ 100 =" 100.=" 
X,= 44.47 C.V, = 15.39% 
X, = 47.56 C.Vy = 11.74% 


Median = 12.7 M.D = 2.86 

M.D (eq) 21.65 

S$.D=7.99 

i) Tube B has greater absolute dispersion 


ii) Tube B has greater relative dispersion 


C.V = 14.64% 

C.V = 44.122% 

4, =0, Bz = 25.7, H; = 20.66, 
wy = 0.061, uy = 2.64, ui, = 0.564, 
2, =0, H = 2.637, Hy = 0.0811, 


C.V = 34.64%, Distribution is Symmetric 
Distribution is platykurtic as £, = 2.158 


S,= 0, the dist is symmetrical 


Mg = 1189.7 
Mi = 28.38 


Hy, = 28.301, 
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4.26 


4.27 
4.28 
4.29 
4.30 
4.31 


4.33 


4.34 


4.35 


4.36 


4.37 


4.38 


4.39 


4.40 


Exercise 5 
5.7 


5.8 i) 


H, = 0, H, = 7.0891, i =- 4.99, fy =- 4916.02 
m, can never be negative, thercfore the given data is wrong. 

-vely Skewed 

Coefficient of S, = 0.1218 

Coefficient of S, = 0.299 


X=3, CV=41% 


w= 11, By = 49, Mi, = 192, B,=1.8, B, = 1.59 
Dist I is Platykurtic 
Dist II is Leptokurtic 
i) -vely Skewed ii) -vely Skewed iii) +vely Skewed 
i) Dist I is Platykurtic ii) S.D=3 

Dist Il is Platykurtic 
H,= 1875 
i) +vely Skewed ii) -vely Skewed iii) +vely Skewed 
i) Symmetric ii) +vely Skewed iii) —vely Skewed 
iv) Symmetric v) LeptoKutric 
i) F ii) F iii) T vi) F v) 
vi) T vii) T viii)  F ix) F x) 
xi) ih ai) xiii) F xiv) T xv) 
i) Called Scatter ii) Relative iii) | Median 
iv) Change v) Free, measurement vi) Consistent 
vii) Skewness viii) Equidistant ix) Zero 
x) Ratio 


48.39, 61.29, 67.74, 96.77, 119.35, 122.58, 129.03, 154.84 
100, 149.37, 155.12, 106.3, 123.23, 195.67, 164.57, 155.91, 


186.22, 193.31, 123.23 127.17, 159.06, 198.82, 255.12 


i) 


5.9 i) 
ii) 


5.10 i) 


iii) 
5.11 
5.12 
5.13 
5.14 
5.15 
5.16 
5.17 i) 

ii) 
5.18 
5.19 i) 
5.20 
5.21 
5.22 
5.23 


5.25 


5.28 


254 


62.69, 93.63, 97.24, 66.63, 77.25, 122.66, 103.16, 97.73 

116.73, 121.17, 77.25, 79.71, 97.70, 124.63, 159.92 

100, 111.11, 122.22, 133.33, 155.56, 144.44, 166.67, 211.11, 200 
66.94, 74.38, 81.82, 89.26, 104.14, 96.70, 111.57, 141.33, 133.89 
100, 125, 150, 175, 200, 250, 225, 250, 275 

80, 100, 120, 140, 160, 200, 180, 200, 220. 

40, 50, 60, 70, 80, 106, 90, 100, 110 

100, 103.85, 100, 88.46, 94.54, 86.54, 80.77, 96.15, 78.85, 80.77 
100, 104, 110, 114, 124, 144, 146, 150, 142, 140 

100, 100, 101.07, 104.68, 108.90, 108.90 

100, 106.96, 103.175, 110.095 

100, 115.38, 126.92, 153.85 

100, 112.70, 149.74, 158.90, 186.02 

100, 91.66, 82, 81.67, 111.33, 113.33 

100, 89.07, 68.25, 69.29, 109.52, 107.62 

100, 118.79, 112.29, 139.53, 121.19 

400 ii) 400 

101.2508 

86.77 

71.83 

162.62 

Laspeyre's = 108.78 Paasche's = 109.21 


Fisher's = 108.99 


100, 99.11, 107.75, 104.1, 105.53, 126.66, 126.66, 133.83, 137.53 
104.66 


98.15 


§.29 i) Price number 








Answers Wass 
ii) Volume index number 
iii) Aggregative index number 
iv) Forecasting, seasonal, cycle 
v) General or special 
vi) 


vii) Anormal year 


Fixed base method, chain base method 


viii) Chaining process, chain base method 


ix) Chain base method 


x) Aggregative, average of relatives 


5.30i)T ii) T. iii) F iv) T 


Exercise 6 
6.1 ii) (a) ANB={} 


6.2 i) 5040 ii) 518918400 


v) 3603600 vi) 120 
6.3 i) {(1, 2) (1, 3), (4, 2), (4, 3)} 

iii) {(1, 4), (1, 1), 4, 0, (4, 4} 
6.8 i) 5 ii) = iii) 


15 
6.10 P(sum<7)=P(sum>7) = rs 


6.13 = 

«14 i) = ii) 3 ii) 
6.15 i) = ii) = ui) 
6.16 i) =: i) a ii) 


v) T 


(b) {1,2, 3, 


vi) T vii) F viii) F ix) F x) F 


4,5,7} (©) {2,4,6, 7, 8} (d) {2,4} 


iti) 367567200 iv) 60480 


vii) 10 viii) 635013559600 ix) 


ii) {(2, 1), (2, 4), (3, 1), (3, 4)} 


iv) {(2, 2), (2, 3), G, 2), (3, 3)} 


|— 


N 
N 
ple 


676 
1326 





330 


792 


See 
. 20349 
BE no 
62 oo 
. jaa 
4 74g 
6.221 -— ii 
i) 364 ) 
iii) = iv) 
6.26 0.92 6.27 
7 
6.29 ee ae 
429 
631i) 0.25 ii) 
6.33) < ii) 
6.34 ee gas 
22100 
637. 05 6.38 
la sar 6.41 
80 
6.43 0.23 6.44 
a 5 
ii — 
dt 46 
6.47 a eae 
12 
6.50 aS Ga 
63 
1 
6555... =. i 
) 35 i) 


ii) Singelton or unit 


vi) Collectively exhaustive 


ix) Permutation, "p, 


655i) T ii) F 


iii) F 


13 5 16 
pict = 20 —— 
C18 Feral OE ce 6:20 1326 
33 
66640 
100 . 25 ae 50 
_—— . coe il Fe 
a 623-4) 75 ) 75 
Lins 
75 
3 ee oS 
2 18 
i) 0.0355 ii) 0.04395 
207 
0.2083 6.32 —_— 
625 
3 ase 4 : 13 2 : 9 
sel wae | — v = vi — 
ee ag eR Ne ga aS 
1 : rd 55 
. “a i iy 6.36 “we ae 
i) a ii) ot i) 16 ii) 784 
i) 1 ii) ; 6.39 4to3 or — 
; 19 ie 23 4 
_ — 6.42 - 
OR. Fas (4 42 9 
7 : 1 5 
— 6.45 i = ii = 
8 ) 8 ) 72 
A 19 
iv = 6.46 0.1 
) 27 
Lg ,30,25 
Si OF at 
_ ga 8 
36 32 
hse. ,i) Il defined, disti 
100 \e well defined, distinct 
iii) Power iv) Compound v) Mutually Exclusive 
vii) Exhaustive vin 
x) Combination 
iv) F vy) F vi) T vii) F vili)T &) F x) T 








Answers ®@ 


Exercise 7 
7.1 iii) a) Discrete random variable 
b) Discrete random variable 
c) Continuous random variable 
d) Continuous random variable 
7.2 m= 100 a=2l b=7 x, = 10 
The numbers are: 17, 64, 51, 78, 45, and 52 


73 Let the one digit random numbers from table are: 
8 3 2 6 9 8 2 8 1 2 
so 
H ey H H T H H H - II 


The frequency of 1 head = 7 and frequency of 0 head = 3 
74 SHUT, “FT, TH, HH} 
7.6 S={BB, BM, MB, MM} and 

S={0, 1, 2} 
7.7 S={TT, HT, TH, HH} and 


S={0, 1, 2} 





as oe [oem [Pw [ow 


eS TE Ns a LI 





7.11i) Generated, random ii) Equal iii) Chance or Stochastic 
iv) Discrete v) Range, Continuous (vi) 1 
vii) a mathematical viii) Mathematical expectation 
ix) converges x) Discrete Continuous 


Exercise 8 


8.2 






1/56 15/56 | 30/56 





Text Book of Statistics 





8.3 


re | 12/35 18/35 4/35 


1/16 6/16 4/16 









MCE! ae Se eae 
15/210 80/210 90/210 24/210 1/210 





8.6 ii) a) No b) No c) | Yes d) No 
8.7 a) c=5 b) 0<C<1 
8.8 ee <6< x 
3 4 
8.9 i) 7.44 ii) a) 0.55, 1.3475 b). } 2+1,:5:39 
8.101) 0.2 - fi) 0.6 iii) 0.6 iv) 0.7 v) 09 
811i) 3,4, 5, 6,7, 8, 9} (ii) {1, 2,3, 4} 


812  — {0, 1, 2,3, 4} 
8.13 y takes the value between 0 and 30, continuous 


8.14 Mean = 2.3 Variance = 2.01 


5 
8.15i.a) a=1/8 b = 
ia) 0 eae 
iia) c= 1/2 b) 0.3125 c) 0.5625 


iii. a) k=2, 13/216 


Answers PARy 


8.17 









fara 


8.18i) Probability Distribution of x 


3/8 
FM 3 A) Dal il 
280/792 | 336/792 | 112/792 8/792 792/792 
s 







ii) 336/792 iii) — 


8.19ii) k=b-a 


8.20ii) Mean = 1.94, S.D (x) = 1.44, C.V. =73.71%, 8.21 Mean 2.6, Var = 1.34 
8.22 i) a) 0.7031 b) 0.543125 c) 0 
1 15 3 
ey 9 ee 9 
De ) 16 ) i6 
8.23 i) a) 1/210 it) 0.4928 iii) 0.6476 vi) 0.6851 


8.24 Rs. 352.5 
8.25 1.875 =2 


8.27 i) 










ae Oe oe ee ea ee 
25/252 | 100/252 | 100/252 | 25/252 | 1/252 


8.28 ii) Mean = 2.875 





8.29 A’s expectation = Rs. 6.0; —__B’s expectation = 5.0 
8.30 A’s expectation = Rs. 6.0; _B’s expectation = 4.0 
C’s expectation = Rs. 2.0 


| oe Lee ee 
1/56 15/56 30/56 10/56 | 56/56 


Mean = 1.875, Variance = 0.5023 


8.31 











8.32 Probability Distribution 
eens eae ere CN Ae 
17256 | 12/256 | 54/256 | 108/256 | 81/256 
















Text Book of Statistics 


8.33 
Mean = 0,2 
8.34 (EX) = 8/5, Variance (X) = 8.24 
E(SX+10)=18, Variance (SX + 10) = 8.24, 1:25 
8.35 Mcan = 7 Variance = 5.83 
8.36 i) E(X)=7 ii) EX ) = 590 iii) Variance (X) = 541 


8.37i) E(X)=0.75, E(X’) = 1.125 


8.38i) T ii) F iti) T iv) F v) T vi) T vii) F_ viii) T SX) fly). XP CE 


3125/23768 | 9375/23768 | 11250/23768 | 6750/23768 | 2025/23768 243/23768 
; 1 11 


sh 11 i 













9.4 i) = ii) : iii) = iv) = 

9.5 i) 0.4096 ii) 0.1536 9.6 i) 0.995 _ ii) 0.2617 
9.7 i). 0.1694 ii) 0.5340 9.8 i) 0.9011 ii) 0.017 
9.9 i) 56/256 ii) 254/256 9.40 Mean=-l, Variance = 0.067 


911i) 243/1024 ii) 1008/1024 


ie SON NR RS: lines ER NS ee 
3125/7776 | 3125/7776 | 1250/7776 | 250/7776 25/1776 1/7776 






Mean = 0.833, Variance = 0.6945 
9.13 3, 15, 30, 30, 15,3 
9.14 





» n=10 9.16 Mean = 50, S.D = 6.7082 


a __________ Answers BX! 


9.17 P=0.25 r n=12 9.18 Not Possible, g cannot be greater than | 
9.19 P= 3 , Mean = 5/6 
9.20 Not possible, g cannot be greater than 1 9.21 p=025 , g=0.75 , n=48 


P(X=x) = °C, 0.25)" 0.75)" * forx=0, 1,2, 3, 5 48 


9.22 Mean=3p Variance = 3pq 9.23 Mean=np S$.D= Jnpq 
1 
9.241) np, npq Hi) Pare 3 iii) nis small iv) Two v) p<q 
vi) Discreterv. vii) SD viii) fixed, independent ix) p=q 








Mean = 70/66 == Variance = 0.6942 
Mean eye Variance = n-a-2( 4 =| 0.6942 
N tl N N\N-1 










ee | a ae 
21/462 140/462 210/462 84/462 7/462 


Mean = 1.8181, Variance = 0.6942 
Mean = n= 18181 Variance = 0.6942 
9.27 





9.28 Expected number of women = 15/8 9.29 i) 1/252 ii) 126/252 
9.30 i) T ii)F iti) T iv)T.  v)F —vi)T vii) T viii)F ix)T x)T 


xi)T xii)T xiii) T 
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Aw ‘: ah ‘af a 











Son, yes! 
Lat | will definitely buy 
a Carom Board 









; Dad, will you 

* buy me a Carom 
, Board on my 
pp, birthday? 








Everybody is sitting in one room whereas the whole house is fully lit. 


Save Electricity 


eS - 












Son, have 


Yes, Dad! you switched off 
|have switched off ait the lights? 


the lights. 
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